# Natural Language (NLP)


<!-- WARNING: THIS FILE WAS AUTOGENERATED! DO NOT EDIT! -->

The resources related are as following:

1.  [Lesson 4 lecture](https://www.youtube.com/watch?v=toUgBQv1BT8)

2.  [Deep Learning for Coders with Fastai and PyTorch: AI Applications
    Without a PhD Chapter
    10](https://github.com/fastai/fastbook/blob/master/10_nlp.ipynb)

3.  Course Notebooks:

    1.  [Getting started with NLP for absolute
        beginners](https://www.kaggle.com/code/jhoward/getting-started-with-nlp-for-absolute-beginners)
    2.  [Iterate like a
        grandmaster!](https://www.kaggle.com/code/jhoward/iterate-like-a-grandmaster)

**Detect if notebook is running on Kaggle**

It’s a good idea to ensure you’re running the latest version of any
libraries you need. `!pip install -Uqq <libraries>` upgrades to the
latest version of <libraries>

``` python
import os
iskaggle = os.environ.get('KAGGLE_KERNEL_RUN_TYPE', '')

if iskaggle:
    print('Is running on Kaggle.')
    !pip install -Uqq fastai
```

## NLP for absolute beginners

In this lecture, Jeremy explains the basics and foundations of NLP
through the [U.S. Patent Phrase to Phrase Matching Kaggle
compettion](https://www.kaggle.com/competitions/us-patent-phrase-to-phrase-matching).

He explains the simple usage of Pandas and numpy and HuggingFace
Transformers while walking us through submission of the competition.

He also explains tokenization and the importance of selecting train and
validation sets correctly and highlighted the reading of Dr. Rachel
Thomas artcile [How (and why) to create a good validation
set](https://www.fast.ai/posts/2017-11-13-validation-sets.html)

Then Jeremy articulates *Pearson correlation coefficient* with a sample
and explains what different values of *r* means with some visualizations
of a real example.

I’d like to suggest using modern chat bots such as google Gemini to ask
them explain a subject clearly.

For enhancing the competition particication skills and basically
improving the skills in ML development, Jeremy has developed another
notebook walking us through the steps that he takes for iterating as
quickly and easily as possible in submitting for that competition.
