The paper’s introduction does not deeply answer that question; however, it does contrast the idea of inductive transfer learning with transductive transfer learning.
Inductive transfer learning is popular in CV and is described quite simply as a transfer technique targeting only the models first layer.
The previous issue with using transfer learning effectively in the past was not with the idea of using it for NLP, but rather the lack of knowledge of how. NLP models typically are shallower, thus requiring different fine-tuning methods compared to the deeper networks effective with CV tasks.
The same 3-layer LSTM architecture— with the same hyperparameters and no additions other than tuned dropout hyperparameters— outperforms highly engineered models and transfer learning approaches on six widely studied text classification tasks.
With a generic approach able to outperform highly engineered models, it sounds like this paper is gearing up to be an interesting paper.