Researchers are starting to unravel one of the biggest mysteries behind the AI language models that power text and image generation tools like DALL-E and ChatGPT.
For a while now, machine learning experts and scientists have noticed something strange about large language models (LLMs) like OpenAI’s GPT-3 and Google’s LaMDA : they are inexplicably good at carrying out tasks that they haven’t been specifically trained to perform. It’s a perplexing question, and just one example of how it can be difficult, if not impossible in most cases, to explain how an AI model arrives at its outputs in fine-grained detail.
In a forthcoming study posted to the arXiv preprint server, researchers at the Massachusetts Institute of Technology, Stanford University, and Google explore this “apparently mysterious” phenomenon, which is called “in-context learning.” Normally, to accomplish a new task, most machine learning models need to be retrained on new data, a process that can normally require researchers to input thousands of data points to get the output they desire—a tedious and time-consuming endeavor.