The progress in NLP took a great leap with the introduction of Transformer architecture.
Based on NLP's successful architecture, language models such as BERT have achieved state-of-the-art results in various NLP tasks.
The big idea
Here's the central idea behind these language models. It's to train a massive corpus in an unsupervised manner to learn language structure, grammar, and semantics.
These massive pre-trained models can then be used as an encoder to generate contextual and semantic representations of text. By using transfer learning, several downstream NLP tasks can be performed with ease. Tasks such as text classification, sentiment analysis, question answering, and summarizing.
Text-to-Text Transformers
Recently Google has made a significant advancement in this area by releasing a new model, Text-To-Text Transformer or T5.
How text-to-text transformers work?
T5 reframes all NLP tasks into a unified text-to-text format where the input and output of the model is text. It takes text input from various NLP tasks and predicts text output for the respective task as shown below:
Every task considered uses text as input to the model, which is trained to generate some target text.
This allows the same model, loss function, and hyper-parameters across diverse sets of tasks, including translation (green), linguistic acceptability (red), sentence similarity (yellow), and document summarization (blue).
The model was trained on Colossal Clean Crawled Corpus (C4) dataset. Which is a cleaned version of Common Crawl and is two orders of magnitude larger than Wikipedia.
The largest model has 11 billion parameters and achieved state-of-the-art results on the GLUE, SuperGLUE, SQuAD, and CNN/Daily Mail benchmarks.
The pre-trained model can be used as is without any further fine tuning for NLP/NLU tasks such as sentiment analysis, NER, Retail POS, Question Answering, Translation, and Summarization.