Provable Benefits of Task-Specific Prompts for In-context Learning
- URL: http://arxiv.org/abs/2503.02102v2
- Date: Wed, 05 Mar 2025 16:18:33 GMT
- Title: Provable Benefits of Task-Specific Prompts for In-context Learning
- Authors: Xiangyu Chang, Yingcong Li, Muti Kara, Samet Oymak, Amit K. Roy-Chowdhury,
- Abstract summary: In this work, we consider a novel setting where the global task distribution can be partitioned into a union of conditional task distributions.<n>We then examine the use of task-specific prompts and prediction heads for learning the prior information associated with the conditional task distribution using a one-layer attention model.
- Score: 44.768199865867494
- License: http://creativecommons.org/licenses/by-nc-nd/4.0/
- Abstract: The in-context learning capabilities of modern language models have motivated a deeper mathematical understanding of sequence models. A line of recent work has shown that linear attention models can emulate projected gradient descent iterations to implicitly learn the task vector from the data provided in the context window. In this work, we consider a novel setting where the global task distribution can be partitioned into a union of conditional task distributions. We then examine the use of task-specific prompts and prediction heads for learning the prior information associated with the conditional task distribution using a one-layer attention model. Our results on loss landscape show that task-specific prompts facilitate a covariance-mean decoupling where prompt-tuning explains the conditional mean of the distribution whereas the variance is learned/explained through in-context learning. Incorporating task-specific head further aids this process by entirely decoupling estimation of mean and variance components. This covariance-mean perspective similarly explains how jointly training prompt and attention weights can provably help over fine-tuning after pretraining.
Related papers
- Statistical Deficiency for Task Inclusion Estimation [24.755448493709604]
Tasks are central in machine learning, as they are the most natural objects to assess the capabilities of current models.
This study proposes a theoretically grounded setup to define the notion of task and to compute the bf inclusion between two tasks from a statistical deficiency point of view.
arXiv Detail & Related papers (2025-03-07T15:00:28Z) - Learning Task Representations from In-Context Learning [73.72066284711462]
Large language models (LLMs) have demonstrated remarkable proficiency in in-context learning.<n>We introduce an automated formulation for encoding task information in ICL prompts as a function of attention heads.<n>We show that our method's effectiveness stems from aligning the distribution of the last hidden state with that of an optimally performing in-context-learned model.
arXiv Detail & Related papers (2025-02-08T00:16:44Z) - On the Loss of Context-awareness in General Instruction Fine-tuning [101.03941308894191]
We investigate the loss of context awareness after supervised fine-tuning.<n>We find that the performance decline is associated with a bias toward different roles learned during conversational instruction fine-tuning.<n>We propose a metric to identify context-dependent examples from general instruction fine-tuning datasets.
arXiv Detail & Related papers (2024-11-05T00:16:01Z) - In-context Learning in Presence of Spurious Correlations [8.055478206164105]
We study the possibility of training an in-context learner for classification tasks involving spurious features.
We find that the conventional approach of training in-context learners is susceptible to spurious features.
We propose a novel technique to train such a learner for a given classification task.
arXiv Detail & Related papers (2024-10-04T04:26:36Z) - Data-CUBE: Data Curriculum for Instruction-based Sentence Representation
Learning [85.66907881270785]
We propose a data curriculum method, namely Data-CUBE, that arranges the orders of all the multi-task data for training.
In the task level, we aim to find the optimal task order to minimize the total cross-task interference risk.
In the instance level, we measure the difficulty of all instances per task, then divide them into the easy-to-difficult mini-batches for training.
arXiv Detail & Related papers (2024-01-07T18:12:20Z) - Continual Learning with Distributed Optimization: Does CoCoA Forget? [0.0]
We focus on the continual learning problem where the tasks arrive sequentially.
The aim is to perform well on the newly arrived task without performance degradation on the previously seen tasks.
We consider the well-established distributed learning algorithm COCOA.
arXiv Detail & Related papers (2022-11-30T13:49:43Z) - Conditional Meta-Learning of Linear Representations [57.90025697492041]
Standard meta-learning for representation learning aims to find a common representation to be shared across multiple tasks.
In this work we overcome this issue by inferring a conditioning function, mapping the tasks' side information into a representation tailored to the task at hand.
We propose a meta-algorithm capable of leveraging this advantage in practice.
arXiv Detail & Related papers (2021-03-30T12:02:14Z) - OCEAN: Online Task Inference for Compositional Tasks with Context
Adaptation [150.1979017130774]
We propose a variational inference framework to perform online task inference for compositional tasks.
Our framework supports flexible latent distributions based on prior knowledge of the task structure and can be trained in an unsupervised manner.
arXiv Detail & Related papers (2020-08-17T04:50:34Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.