NLP has reached the ‘image net’ moment. This means there is an increasing ability to crack complex language problems with the newer language models.
The availability of large open-source pre-trained language models combined with transfer learning techniques has made it possible for users to solve complex problems with ease. This includes language translation, text classification, question answering, language understanding, and language generation.
The advancement in NLP technology has fuelled a so-called war to build the next bigger, better language model that can beat the competition by its sheer size and complexity of tasks that can be performed.
Developments of Large Language Models
Recent announcements from Nvidia, Microsoft, and Open AI around very large language models are accelerating the pace of growth.
Last year Nvidia released Megatron, a language model that had 8.3 billion parameters.
This February, Microsoft announced the largest Transformer based model, Turing NLG, which is twice the size of Megatron at 17 billion parameters.
Open AI had already released GPT-1 and GPT-2 in the past. In June 2020, they released GPT-3, a monster model, that packs 175 billion parameters.
Challenges ahead
The biggest question and challenge that lies in front of most NLP professionals is:
How do we get to operate such huge language models that need vast computing resources?
In order to train and finetune such language models for NLP tasks, we will need clusters of GPU or TPU servers that will cost a fortune.
So does this mean only big companies with deep pockets will benefit from such advancements? Will smaller and mid-sized companies be left behind?
OpenAI has taken a step in the right direction by announcing that they will release an API to access these very large AI models.
This approach will benefit everyone and will make AI accessible and affordable.
The API can be used to improve the fluency of chatbots like Engati, gaming experiences, semantic search, AI assistants, and much more.
They have opened up a beta program for anyone interested to try out their models for tasks such as semantic search, summarization, sentiment analysis, content generation, translation, and more.
If other companies which are building such very large models also commercialize their offerings it will lead to healthy competition and bring benefit to the customers with respect to quality and price.