Related papers: Adaptive Task Vectors for Large Language Models

Adaptive Task Vectors for Large Language Models

URL: http://arxiv.org/abs/2506.03426v1
Date: Tue, 03 Jun 2025 22:12:28 GMT
Title: Adaptive Task Vectors for Large Language Models
Authors: Joonseong Kang, Soojeong Lee, Subeen Park, Sumin Park, Taero Kim, Jihee Kim, Ryunyi Lee, Kyungwoo Song,
Abstract summary: Adaptive Task Vectors (ATV) is a simple and effective framework that dynamically generates task vectors conditioned on each input query.<n>ATV demonstrates strong performance and generalization capabilities, even for unseen tasks.
Score: 14.108866468832623
License: http://creativecommons.org/licenses/by/4.0/
Abstract: In-Context Learning (ICL) enables Large Language Models (LLMs) to perform tasks without parameter updates by conditioning on a few demonstrations provided in the prompt. Despite its success, ICL suffers from several limitations, including sensitivity to demonstration order, context length constraints, and computational inefficiency. To address these challenges, task vector-based approaches compress task information into a single vector. However, these methods typically construct task vectors from fixed sets of demonstrations and reuse them across input queries, without conditioning on the specific input. This limitation can lead models to struggle with effective adaptation when the input query is not well aligned with the underlying demonstrations, consequently degrading their generalization performance on unseen tasks. To overcome this limitation, we propose Adaptive Task Vectors (ATV), a simple and effective framework that dynamically generates task vectors conditioned on each input query. ATV employs a small language model to generate task vectors, which are then transformed to match the target LLM's architecture and applied to guide its output generation. In contrast to ICL and previous vector-based approaches, which rely on fixed demonstration sets and their corresponding vectors, ATV dynamically generates task vectors tailored to each specific input query and task. Consequently, ATV demonstrates strong performance and generalization capabilities, even for unseen tasks. Furthermore, we provide a theoretical analysis indicating that ATV is expressively equivalent to LoRA under equal rank budgets and more expressive than Prefix-Tuning, thereby offering formal support for its representational advantage.

Related papers

Leveraging In-Context Learning for Language Model Agents [51.2996117207114]
In-context learning (ICL) with dynamically selected demonstrations combines the flexibility of prompting large language models (LLMs) with the ability to leverage training data to improve performance.<n>We show that set-selection of trajectories of similar tasks as demonstrations significantly improves performance, reliability, robustness, and efficiency of LLM agents.<n>We find that demonstrations obtained from larger models (in the annotation phase) also improve smaller models, and that ICL agents can even rival costlier trained agents.
arXiv Detail & Related papers (2025-06-16T05:37:49Z)
Understanding Task Vectors in In-Context Learning: Emergence, Functionality, and Limitations [19.539276425108987]
This work proposes the Linear Combination Conjecture, positing that task vectors act as single in-context demonstrations formed through linear combinations of the original ones.<n>We show that task vectors naturally emerge in linear transformers trained on triplet-formatted prompts through loss landscape analysis.<n>We predict the failure of task vectors on representing high-rank mappings and confirm this on practical LLMs.
arXiv Detail & Related papers (2025-06-10T17:59:31Z)
Beyond Demonstrations: Dynamic Vector Construction from Latent Representations [11.916165865594365]
In-Context derived Vector (ICV) methods extract task-relevant representations from large language models (LLMs) and reinject them during inference.<n>DyVec provides a lightweight and data-efficient solution for inference-time task adaptation.
arXiv Detail & Related papers (2025-05-23T12:13:50Z)
Learning Task Representations from In-Context Learning [73.72066284711462]
Large language models (LLMs) have demonstrated remarkable proficiency in in-context learning.<n>We introduce an automated formulation for encoding task information in ICL prompts as a function of attention heads.<n>We show that our method's effectiveness stems from aligning the distribution of the last hidden state with that of an optimally performing in-context-learned model.
arXiv Detail & Related papers (2025-02-08T00:16:44Z)
Multi-Task Model Merging via Adaptive Weight Disentanglement [69.7292615212444]
We introduce an Adaptive Weight Disentanglement method for model merging.<n>We successfully extract redundant vectors, and after their subtraction, the task vectors retain robust performance.
arXiv Detail & Related papers (2024-11-27T20:08:55Z)
Vision-Language Models Create Cross-Modal Task Representations [58.19152818504624]
We find that vision-language models (VLMs) can align conceptually equivalent inputs into a shared task vector.<n>We measure this alignment via cross-modal transfer on a range of tasks and model architectures.<n>We show that task vectors can be transferred from a base language model to its fine-tuned vision-language counterpart.
arXiv Detail & Related papers (2024-10-29T17:59:45Z)
Distributed Rule Vectors is A Key Mechanism in Large Language Models' In-Context Learning [3.1775609005777024]
Large Language Models (LLMs) have demonstrated remarkable abilities, one of the most important being In-Context Learning (ICL) Previous work hypothesized that the network creates a "task vector" in specific positions during ICL. We discover that such "task vectors" do not exist in tasks where the rule has to be defined through multiple demonstrations.
arXiv Detail & Related papers (2024-06-23T04:29:13Z)
Task Indicating Transformer for Task-conditional Dense Predictions [16.92067246179703]
We introduce a novel task-conditional framework called Task Indicating Transformer (TIT) to tackle this challenge. Our approach designs a Mix Task Adapter module within the transformer block, which incorporates a Task Indicating Matrix through matrix decomposition. We also propose a Task Gate Decoder module that harnesses a Task Indicating Vector and gating mechanism to facilitate adaptive multi-scale feature refinement.
arXiv Detail & Related papers (2024-03-01T07:06:57Z)
Towards Unified Token Learning for Vision-Language Tracking [65.96561538356315]
We present a vision-language (VL) tracking pipeline, termed textbfMMTrack, which casts VL tracking as a token generation task. Our proposed framework serializes language description and bounding box into a sequence of discrete tokens. In this new design paradigm, all token queries are required to perceive the desired target and directly predict spatial coordinates of the target.
arXiv Detail & Related papers (2023-08-27T13:17:34Z)
Instruction Position Matters in Sequence Generation with Large Language Models [67.87516654892343]
Large language models (LLMs) are capable of performing conditional sequence generation tasks, such as translation or summarization. We propose enhancing the instruction-following capability of LLMs by shifting the position of task instructions after the input sentences.
arXiv Detail & Related papers (2023-08-23T12:36:57Z)

This list is automatically generated from the titles and abstracts of the papers in this site.