Low Shot Learning

A key property of transfer learning is the idea of low-shot learning. This enables you to train similar or better models, with 1-2 orders of magnitude (50x - 100x) less data examples. That is amazing!

On ImDb and AG, the paper posts results that ULMFiT with only 100 labeled examples matches the performance of training from scratch with 10x and 20x more data, demonstrating the benefits of general domain LM pretraining

The impact of pretraining

In every instance, pretraining improves performance and data needs. Regardless of target dataset size.

Impact of LM quality

It matters, but a vanilla LM still provides benefits.

Impact of classifier fine-tuning

The impact of classifier fine-tuning scales about a factor of 2+ on IMDb and TREC-6 is less, but still significant. See table below.