Large language models like GPT-3 have stormed the NLP area in the last few years and have given us hope that we can solve some of the toughest problems.
The race to build the best language model has resulted in a so-called ‘arms race’ where bigger seems to be better so far.
What are the applications of Generative Language Models?
GPT-3 by Open AI is the most well-known and most anticipated among other popular language models. It has been trained with 175 billion parameters. Many applications have been built with GPT-3 to demonstrate its capabilities, more for fun than profit.
But it does show the path forward in NLP and democratizes applications that build with NLP at the core. For NLP to come to the forefront in everyday applications, zero-shot learning, and few-shot learning is the way forward.
The recent news about Copilot, a code completion and code recommendation tool from Github, has spotlighted generative language models and their capabilities. Copilot could be the first commercial use of such models in a big way.
Others are not staying behind. Microsoft has built Turing-NLG, a 17 billion parameter model already used in Bing’s search engine to improve auto-suggestion and phrase prediction. This is a well-known commercial application of generative language models.
Even though this model is 10X downscaled compared to GPT-3, the fact that Microsoft has found it to be useful enough to use in a search engine demonstrates the capabilities for such models.
Google had made an announcement of 1.6 trillion parameter language model early this year. This has been built by the Google Brain team using an efficient transformer based algorithm called Switch Transformers.
This gives the hope that the cost to train and use such huge language models can be brought down making them economically feasible for small and medium companies working in the NLP area. This should give a huge boost to building more applications based on generative models.
Although application of this 1.6T language model has not been published as yet, Google has claimed in the past of using BERT based transformer models in Google search engine.
So, it is very likely that this will find its way into Google’s search engine in the near future.
Google’s Meena chatbot model is yet another application of a large language model which has been trained in conversation data. This is a 2.6 billion parameter model and was trained in multi-turn conversations which is one of the most difficult tasks for a chatbot.
The latest development in this area is the news about a 1.75 trillion parameters language model, called WuDao 2.0, built by Beijing Academy of Artificial Intelligence that understands English and Chinese and is claimed to simulate conversations, understand pictures, write poems and even create recipes.
The future of Generative Language Models
Currently, Open AI is pursuing commercialization of GPT-3. One good example of this is Viable which has built a product to find answers to user queries buried deep in customer surveys, product reviews and help desk data.
Similarly, Engati uses DocuSense Technology to provide easy access to relevant information. The engine parses through the given knowledge base and fetches the required information requested by the user. Engati strongly believes in ease, which is why this is carried out with a no code approach.
Try DocuSense today!
Also check out this 55+ tech podcasts to know more on how digital devices have change our lives.