Conclusions

This paper was extremely well written. It was easy to understand and all of the language was quick to pickup. Readability is more of a soft skill that is hard to pinpoint, but I think these techniques helped:

Bolded Tags It makes it easy to navigate quickly, and also serves as a quick intro to what is about to be read. Also, it enables you to quickly scan through the document.
Apt Analogies Using “chain thaw” to explain the process of unfreezing is immediately helpful for understanding what is truly going on without unnecessary time spent explaining.
Limited Greek Its always going to be faster to read the less greek there is, and the authors used only what was necessary.
Useful Tables The tables have a short and easy to understand point. The tables sizes are divided appropriately as to not ever have too much information in a single table.

Takeaways

Language Model Transfer The state of the art results are compelling to think that a universal language model is able to be used to transfer generic language understanding to be only fine-tuned to a specific application. And that with much less data than ground up.

Transfer Learning Delicacies The fine-tuning stage is not a simple plug and chug mechanism, but requires not only a single good practice, but the collection of good practices like slanted triangular learning rate and chain thaw unfreezing.