Natural Language (NLP)

This notebook is following the fourth lesson in the fast ai Practical Deep Learning for Coders course

The resources related are as following:

  1. Lesson 4 lecture

  2. Deep Learning for Coders with Fastai and PyTorch: AI Applications Without a PhD Chapter 10

  3. Course Notebooks:

    1. Getting started with NLP for absolute beginners
    2. Iterate like a grandmaster!

Detect if notebook is running on Kaggle

It’s a good idea to ensure you’re running the latest version of any libraries you need. !pip install -Uqq <libraries> upgrades to the latest version of

import os
iskaggle = os.environ.get('KAGGLE_KERNEL_RUN_TYPE', '')

if iskaggle:
    print('Is running on Kaggle.')
    !pip install -Uqq fastai

NLP for absolute beginners

In this lecture, Jeremy explains the basics and foundations of NLP through the U.S. Patent Phrase to Phrase Matching Kaggle compettion.

He explains the simple usage of Pandas and numpy and HuggingFace Transformers while walking us through submission of the competition.

He also explains tokenization and the importance of selecting train and validation sets correctly and highlighted the reading of Dr. Rachel Thomas artcile How (and why) to create a good validation set

Then Jeremy articulates Pearson correlation coefficient with a sample and explains what different values of r means with some visualizations of a real example.

I’d like to suggest using modern chat bots such as google Gemini to ask them explain a subject clearly.

For enhancing the competition particication skills and basically improving the skills in ML development, Jeremy has developed another notebook walking us through the steps that he takes for iterating as quickly and easily as possible in submitting for that competition.