GPT-3 was released for public access almost a year ago. Since then, it has generated a lot of hype and discussion regarding commercial usage of such large language models.
Well, GPT-3 is as good as what it was trained on when it was released a year ago.
So what is in GPT-3’s memory? The following table details out the training data of GPT-3.
The challenges of GPT-3
As can be seen from the above table, GPT-3 has ‘seen’ text data from most of what was available in the internet, Wikipedia and other sources as of when the training data would have been created, probably more than a year ago.
This means it does not have the latest data available on the internet as of today.
This happens to be one of the biggest challenges of large language models. It is quite costly to keep updating these large models on a continual basis to keep them up to date with the latest happenings in the world.
It has been estimated that to train GPT-3, 355 GPU years of clock time was needed and the cost was $4.6m.
There is no incremental training that can be imparted to such language models to learn from the latest news and other events happening around the world. Even if we undertake some sort of fine tuning of such pre-trained language models it may result in a common problem called ‘catastrophic forgetting’. The resulting fine tuned model will be a slightly modified version of the original pre-trained model and the knowledge that was gained with the pre-trained language model might be lost. This can result in poor model performance for the target tasks.
However Google and other search engines can easily crawl the internet and index the newly created pages to serve the latest data.
Large language models miss out on such an advantage that search engines have.
In the last year, Covid-19 has taken center stage and generated most of the news and other textual content on the internet. A lot of applications need to work on data centered around Covid-19 and vaccinations. So, it is quite likely that we may want to use GPT-3 to get some information about Covid-19.
Demonstrating GPT-3's "lack" of knowledge
The following text generation prompt was fed to the davinci engine of GPT-3.
engine="davinci",
prompt="The most popular covid-19 vaccines are ",
GPT-3’s output was as follows:
<OpenAIObject text_completion id=<****> at 0x7f4f9d40ce30> JSON: {
"choices": [
{
"finish_reason": "stop",
"index": 0,
"logprobs": null,
"text": "erythromycin, tetracycline, and penicillin."
}
],
"created": <****>,
"id": "<****>",
"model": "davinci:<****>",
"object": "text_completion"
}
As can be seen from the above response GPT-3 is not aware of the latest vaccines for Covid-19 and the names of the vaccines it is suggesting are not the correct ones.
Here we want GPT-3 to generate a SQL query to aggregate the total number of vaccines by country.
engine="davinci",
prompt="Create a SQL request to find total covid-19 vaccination by country:\n\nSELECT",
GPT-3’s output was as follows:
<OpenAIObject text_completion id=<****> at 0x7f4f9d397a70> JSON: {
"choices": [
{
"finish_reason": "stop",
"index": 0,
"logprobs": null,
"text": " country, sum(vaccination) AS total FROM `bigquery-public-data.samples.natality` WHERE vaccination = 'Covid-19' GROUP BY country ORDER BY total DESC"
}
],
"created": 1626617235,
"id": "<****>",
"model": "davinci:<****>",
"object": "text_completion"
}
Although the above SQL query is functionally correct it is syntactically incorrect. GPT-3 incorrectly assumed there is a column called ‘vaccination’ in a public dataset of BigQuery.
When we run the generated query in BigQuery we get the following error.
A simple solution
In Engati, customers can update their FAQs and documents in DocuSense anytime, without any additional cost, to reflect the latest changes made to those documents so that their users can get answers to their queries from the latest data accurately without hitting catastrophic forgetting issues or being served with incorrect information.