Understanding Task Vectors in In-Context Learning: Emergence, Functionality, and Limitations
- URL: http://arxiv.org/abs/2506.09048v1
- Date: Tue, 10 Jun 2025 17:59:31 GMT
- Title: Understanding Task Vectors in In-Context Learning: Emergence, Functionality, and Limitations
- Authors: Yuxin Dong, Jiachen Jiang, Zhihui Zhu, Xia Ning,
- Abstract summary: This work proposes the Linear Combination Conjecture, positing that task vectors act as single in-context demonstrations formed through linear combinations of the original ones.<n>We show that task vectors naturally emerge in linear transformers trained on triplet-formatted prompts through loss landscape analysis.<n>We predict the failure of task vectors on representing high-rank mappings and confirm this on practical LLMs.
- Score: 19.539276425108987
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Task vectors offer a compelling mechanism for accelerating inference in in-context learning (ICL) by distilling task-specific information into a single, reusable representation. Despite their empirical success, the underlying principles governing their emergence and functionality remain unclear. This work proposes the Linear Combination Conjecture, positing that task vectors act as single in-context demonstrations formed through linear combinations of the original ones. We provide both theoretical and empirical support for this conjecture. First, we show that task vectors naturally emerge in linear transformers trained on triplet-formatted prompts through loss landscape analysis. Next, we predict the failure of task vectors on representing high-rank mappings and confirm this on practical LLMs. Our findings are further validated through saliency analyses and parameter visualization, suggesting an enhancement of task vectors by injecting multiple ones into few-shot prompts. Together, our results advance the understanding of task vectors and shed light on the mechanisms underlying ICL in transformer-based models.
Related papers
- Adaptive Task Vectors for Large Language Models [14.108866468832623]
Adaptive Task Vectors (ATV) is a simple and effective framework that dynamically generates task vectors conditioned on each input query.<n>ATV demonstrates strong performance and generalization capabilities, even for unseen tasks.
arXiv Detail & Related papers (2025-06-03T22:12:28Z) - When is Task Vector Provably Effective for Model Editing? A Generalization Analysis of Nonlinear Transformers [64.1656365676171]
Task arithmetic refers to editing the pre-trained model by adding a weighted sum of task vectors.<n>This paper theoretically prove the effectiveness of task addition in simultaneously learning a set of irrelevant or irrelevant tasks.<n>We prove the proper selection for task arithmetic to achieve negation to out-of-domain tasks.
arXiv Detail & Related papers (2025-04-15T08:04:39Z) - Learning Task Representations from In-Context Learning [73.72066284711462]
Large language models (LLMs) have demonstrated remarkable proficiency in in-context learning.<n>We introduce an automated formulation for encoding task information in ICL prompts as a function of attention heads.<n>We show that our method's effectiveness stems from aligning the distribution of the last hidden state with that of an optimally performing in-context-learned model.
arXiv Detail & Related papers (2025-02-08T00:16:44Z) - Task Singular Vectors: Reducing Task Interference in Model Merging [19.4876941464776]
Task Arithmetic has emerged as a simple yet effective method to merge models without additional training.<n>We study task vectors at the layer level, focusing on task layer matrices and their singular value decomposition.<n>We introduce TSV-Merge (TSV-M), a novel model merging approach that combines compression with interference reduction.
arXiv Detail & Related papers (2024-11-26T22:53:06Z) - Mitigating Copy Bias in In-Context Learning through Neuron Pruning [74.91243772654519]
Large language models (LLMs) have demonstrated impressive few-shot in-context learning abilities.
They are sometimes prone to a copying bias', where they copy answers from provided examples instead of learning the underlying patterns.
We propose a novel and simple method to mitigate such copying bias.
arXiv Detail & Related papers (2024-10-02T07:18:16Z) - Latent Causal Probing: A Formal Perspective on Probing with Causal Models of Data [3.376269351435396]
We develop a formal perspective on probing using structural causal models (SCM)
We extend a recent study of LMs in the context of a synthetic grid-world navigation task.
Our techniques provide robust empirical evidence for the ability of LMs to induce the latent concepts underlying text.
arXiv Detail & Related papers (2024-07-18T17:59:27Z) - Knowledge Composition using Task Vectors with Learned Anisotropic Scaling [51.4661186662329]
We introduce aTLAS, an algorithm that linearly combines parameter blocks with different learned coefficients, resulting in anisotropic scaling at the task vector level.
We show that such linear combinations explicitly exploit the low intrinsicity of pre-trained models, with only a few coefficients being the learnable parameters.
We demonstrate the effectiveness of our method in task arithmetic, few-shot recognition and test-time adaptation, with supervised or unsupervised objectives.
arXiv Detail & Related papers (2024-07-03T07:54:08Z) - Distributed Rule Vectors is A Key Mechanism in Large Language Models' In-Context Learning [3.1775609005777024]
Large Language Models (LLMs) have demonstrated remarkable abilities, one of the most important being In-Context Learning (ICL)
Previous work hypothesized that the network creates a "task vector" in specific positions during ICL.
We discover that such "task vectors" do not exist in tasks where the rule has to be defined through multiple demonstrations.
arXiv Detail & Related papers (2024-06-23T04:29:13Z) - Finding Visual Task Vectors [74.67336516908776]
Visual Prompting is a technique for teaching models to perform a visual task via in-context examples, without any additional training.
We analyze the activations of MAE-VQGAN, a recent Visual Prompting model, and find task vectors, activations that encode task-specific information.
arXiv Detail & Related papers (2024-04-08T17:59:46Z) - Low-Rank Multitask Learning based on Tensorized SVMs and LSSVMs [65.42104819071444]
Multitask learning (MTL) leverages task-relatedness to enhance performance.
We employ high-order tensors, with each mode corresponding to a task index, to naturally represent tasks referenced by multiple indices.
We propose a general framework of low-rank MTL methods with tensorized support vector machines (SVMs) and least square support vector machines (LSSVMs)
arXiv Detail & Related papers (2023-08-30T14:28:26Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.