Understanding the Generalization of In-Context Learning in Transformers: An Empirical Study
- URL: http://arxiv.org/abs/2503.15579v1
- Date: Wed, 19 Mar 2025 13:40:45 GMT
- Title: Understanding the Generalization of In-Context Learning in Transformers: An Empirical Study
- Authors: Xingxuan Zhang, Haoran Wang, Jiansheng Li, Yuan Xue, Shikai Guan, Renzhe Xu, Hao Zou, Han Yu, Peng Cui,
- Abstract summary: Large language models (LLMs) like GPT-4 and LLaMA-3 utilize the powerful in-context learning (ICL) capability of Transformer architecture to learn on the fly from limited examples.<n>We present a systematic investigation of transformers' generalization capability with ICL relative to training data coverage.<n>We find that transformers lack inter-problem generalization with ICL, but excel in intra-task and intra-problem generalization.
- Score: 45.08382242972142
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Large language models (LLMs) like GPT-4 and LLaMA-3 utilize the powerful in-context learning (ICL) capability of Transformer architecture to learn on the fly from limited examples. While ICL underpins many LLM applications, its full potential remains hindered by a limited understanding of its generalization boundaries and vulnerabilities. We present a systematic investigation of transformers' generalization capability with ICL relative to training data coverage by defining a task-centric framework along three dimensions: inter-problem, intra-problem, and intra-task generalization. Through extensive simulation and real-world experiments, encompassing tasks such as function fitting, API calling, and translation, we find that transformers lack inter-problem generalization with ICL, but excel in intra-task and intra-problem generalization. When the training data includes a greater variety of mixed tasks, it significantly enhances the generalization ability of ICL on unseen tasks and even on known simple tasks. This guides us in designing training data to maximize the diversity of tasks covered and to combine different tasks whenever possible, rather than solely focusing on the target task for testing.
Related papers
- When can in-context learning generalize out of task distribution? [10.962094053749095]
In-context learning (ICL) is a capability of pretrained transformers that allows models to generalize to unseen tasks after seeing only a few examples.<n>We investigate empirically the conditions necessary on the pretraining distribution for ICL to emerge and generalize emphout-of-distribution<n>We find that as task diversity increases, transformers undergo a transition from a specialized solution, which exhibits ICL only within the pretraining task distribution, to a solution which generalizes out of distribution to the entire task space.
arXiv Detail & Related papers (2025-06-05T20:30:50Z) - Hierarchical Context Transformer for Multi-level Semantic Scene Understanding [37.35498412336018]
We propose to represent the tasks set as multi-level semantic scene understanding (MSSU)
For this target, we propose a novel hierarchical context transformer (HCT) network.
Experiments on our cataract dataset and a publicly available endoscopic PSI-AVA dataset demonstrate the outstanding performance of our method.
arXiv Detail & Related papers (2025-02-21T03:36:16Z) - Learning Task Representations from In-Context Learning [73.72066284711462]
Large language models (LLMs) have demonstrated remarkable proficiency in in-context learning.<n>We introduce an automated formulation for encoding task information in ICL prompts as a function of attention heads.<n>We show that our method's effectiveness stems from aligning the distribution of the last hidden state with that of an optimally performing in-context-learned model.
arXiv Detail & Related papers (2025-02-08T00:16:44Z) - VLABench: A Large-Scale Benchmark for Language-Conditioned Robotics Manipulation with Long-Horizon Reasoning Tasks [100.3234156027118]
We present VLABench, an open-source benchmark for evaluating universal LCM task learning.
VLABench provides 100 carefully designed categories of tasks, with strong randomization in each category of task and a total of 2000+ objects.
The benchmark assesses multiple competencies including understanding of mesh&texture, spatial relationship, semantic instruction, physical laws, knowledge transfer and reasoning.
arXiv Detail & Related papers (2024-12-24T06:03:42Z) - Differential learning kinetics govern the transition from memorization to generalization during in-context learning [0.5555497750998242]
Transformers exhibit in-context learning (ICL): the ability to use novel information presented in the context without additional weight updates.<n>Recent work shows that ICL emerges when models are trained on a sufficiently diverse set of tasks.<n>We show that the sub-circuits that memorize and generalize can be viewed as largely independent.
arXiv Detail & Related papers (2024-11-27T22:12:29Z) - In-Context Learning with Representations: Contextual Generalization of Trained Transformers [66.78052387054593]
In-context learning (ICL) refers to a capability of pretrained large language models, which can learn a new task given a few examples during inference.
This paper investigates the training dynamics of transformers by gradient descent through the lens of non-linear regression tasks.
arXiv Detail & Related papers (2024-08-19T16:47:46Z) - Benchmarking General-Purpose In-Context Learning [19.40952728849431]
In-context learning (ICL) empowers generative models to address new tasks effectively and efficiently on the fly.
In this paper, we study extending ICL to address a broader range of tasks with an extended learning horizon and higher improvement potential.
We introduce two benchmarks specifically crafted to train and evaluate GPICL functionalities.
arXiv Detail & Related papers (2024-05-27T14:50:42Z) - Dynamic Transformer Architecture for Continual Learning of Multimodal
Tasks [27.59758964060561]
Transformer neural networks are increasingly replacing prior architectures in a wide range of applications in different data modalities.
Continual learning (CL) emerges as a solution by facilitating the transfer of knowledge across tasks that arrive sequentially for an autonomously learning agent.
We propose a transformer-based CL framework focusing on learning tasks that involve both vision and language.
arXiv Detail & Related papers (2024-01-27T03:03:30Z) - Towards Task Sampler Learning for Meta-Learning [37.02030832662183]
Meta-learning aims to learn general knowledge with diverse training tasks conducted from limited data, and then transfer it to new tasks.
It is commonly believed that increasing task diversity will enhance the generalization ability of meta-learning models.
This paper challenges this view through empirical and theoretical analysis.
arXiv Detail & Related papers (2023-07-18T01:53:18Z) - Meta-Learning with Fewer Tasks through Task Interpolation [67.03769747726666]
Current meta-learning algorithms require a large number of meta-training tasks, which may not be accessible in real-world scenarios.
By meta-learning with task gradient (MLTI), our approach effectively generates additional tasks by randomly sampling a pair of tasks and interpolating the corresponding features and labels.
Empirically, in our experiments on eight datasets from diverse domains, we find that the proposed general MLTI framework is compatible with representative meta-learning algorithms and consistently outperforms other state-of-the-art strategies.
arXiv Detail & Related papers (2021-06-04T20:15:34Z) - Distribution Matching for Heterogeneous Multi-Task Learning: a
Large-scale Face Study [75.42182503265056]
Multi-Task Learning has emerged as a methodology in which multiple tasks are jointly learned by a shared learning algorithm.
We deal with heterogeneous MTL, simultaneously addressing detection, classification & regression problems.
We build FaceBehaviorNet, the first framework for large-scale face analysis, by jointly learning all facial behavior tasks.
arXiv Detail & Related papers (2021-05-08T22:26:52Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.