What Can Transformers Learn In-context? A Case Study Of Simple Function Classes | Awesome LLM Papers Add your paper to Awesome LLM Papers

What Can Transformers Learn In-context? A Case Study Of Simple Function Classes

Shivam Garg, Dimitris Tsipras, Percy Liang, Gregory Valiant . Arxiv 2022 – 48 citations

[Code] [Paper]   Search on Google Scholar   Search on Semantic Scholar
Compositional Generalization Has Code In Context Learning Interdisciplinary Approaches Model Architecture Multimodal Semantic Representation Neural Machine Translation Prompting Training Techniques Variational Autoencoders

In-context learning refers to the ability of a model to condition on a prompt sequence consisting of in-context examples (input-output pairs corresponding to some task) along with a new query input, and generate the corresponding output. Crucially, in-context learning happens only at inference time without any parameter updates to the model. While large language models such as GPT-3 exhibit some ability to perform in-context learning, it is unclear what the relationship is between tasks on which this succeeds and what is present in the training data. To make progress towards understanding in-context learning, we consider the well-defined problem of training a model to in-context learn a function class (e.g., linear functions): that is, given data derived from some functions in the class, can we train a model to in-context learn “most” functions from this class? We show empirically that standard Transformers can be trained from scratch to perform in-context learning of linear functions – that is, the trained model is able to learn unseen linear functions from in-context examples with performance comparable to the optimal least squares estimator. In fact, in-context learning is possible even under two forms of distribution shift: (i) between the training data of the model and inference-time prompts, and (ii) between the in-context examples and the query input during inference. We also show that we can train Transformers to in-context learn more complex function classes – namely sparse linear functions, two-layer neural networks, and decision trees – with performance that matches or exceeds task-specific learning algorithms. Our code and models are available at https://github.com/dtsip/in-context-learning .

Similar Work