In-context Learning in Presence of Spurious Correlations
- URL: http://arxiv.org/abs/2410.03140v1
- Date: Fri, 4 Oct 2024 04:26:36 GMT
- Title: In-context Learning in Presence of Spurious Correlations
- Authors: Hrayr Harutyunyan, Rafayel Darbinyan, Samvel Karapetyan, Hrant Khachatrian,
- Abstract summary: We study the possibility of training an in-context learner for classification tasks involving spurious features.
We find that the conventional approach of training in-context learners is susceptible to spurious features.
We propose a novel technique to train such a learner for a given classification task.
- Score: 8.055478206164105
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Large language models exhibit a remarkable capacity for in-context learning, where they learn to solve tasks given a few examples. Recent work has shown that transformers can be trained to perform simple regression tasks in-context. This work explores the possibility of training an in-context learner for classification tasks involving spurious features. We find that the conventional approach of training in-context learners is susceptible to spurious features. Moreover, when the meta-training dataset includes instances of only one task, the conventional approach leads to task memorization and fails to produce a model that leverages context for predictions. Based on these observations, we propose a novel technique to train such a learner for a given classification task. Remarkably, this in-context learner matches and sometimes outperforms strong methods like ERM and GroupDRO. However, unlike these algorithms, it does not generalize well to other tasks. We show that it is possible to obtain an in-context learner that generalizes to unseen tasks by training on a diverse dataset of synthetic in-context learning instances.
Related papers
- Learning Task Representations from In-Context Learning [73.72066284711462]
Large language models (LLMs) have demonstrated remarkable proficiency in in-context learning.
We introduce an automated formulation for encoding task information in ICL prompts as a function of attention heads.
We show that our method's effectiveness stems from aligning the distribution of the last hidden state with that of an optimally performing in-context-learned model.
arXiv Detail & Related papers (2025-02-08T00:16:44Z) - Task Vectors in In-Context Learning: Emergence, Formation, and Benefit [17.72043522825441]
We investigate the formation of task vectors in a controlled setting using models trained from scratch on synthetic datasets.
Our findings confirm that task vectors naturally emerge under certain conditions, but the tasks may be relatively weakly and/or non-locally encoded within the model.
To promote strong task vectors encoded at a prescribed location within the model, we propose an auxiliary training mechanism based on a task vector prompting loss.
arXiv Detail & Related papers (2025-01-16T01:54:23Z) - The Learnability of In-Context Learning [16.182561312622315]
We propose a first-of-its-kind PAC based framework for in-context learnability.
Our framework includes an initial pretraining phase, which fits a function to the pretraining distribution.
We show that in-context learning is more about identifying the task than about learning it.
arXiv Detail & Related papers (2023-03-14T13:28:39Z) - What Can Transformers Learn In-Context? A Case Study of Simple Function
Classes [67.06980111346245]
In-context learning refers to the ability of a model to condition on a prompt sequence consisting of in-context examples.
We show that standard Transformers can be trained from scratch to perform in-context learning of linear functions.
We also show that we can train Transformers to in-context learn more complex function classes with performance that matches or exceeds task-specific learning algorithms.
arXiv Detail & Related papers (2022-08-01T18:01:40Z) - Meta-Learning to Compositionally Generalize [34.656819307701156]
We implement a meta-learning augmented version of supervised learning.
We construct pairs of tasks for meta-learning by sub-sampling existing training data.
Experimental results on the COGS and SCAN datasets show that our similarity-driven meta-learning can improve generalization performance.
arXiv Detail & Related papers (2021-06-08T11:21:48Z) - Conditional Meta-Learning of Linear Representations [57.90025697492041]
Standard meta-learning for representation learning aims to find a common representation to be shared across multiple tasks.
In this work we overcome this issue by inferring a conditioning function, mapping the tasks' side information into a representation tailored to the task at hand.
We propose a meta-algorithm capable of leveraging this advantage in practice.
arXiv Detail & Related papers (2021-03-30T12:02:14Z) - Curriculum Learning: A Survey [65.31516318260759]
Curriculum learning strategies have been successfully employed in all areas of machine learning.
We construct a taxonomy of curriculum learning approaches by hand, considering various classification criteria.
We build a hierarchical tree of curriculum learning methods using an agglomerative clustering algorithm.
arXiv Detail & Related papers (2021-01-25T20:08:32Z) - Adaptive Task Sampling for Meta-Learning [79.61146834134459]
Key idea of meta-learning for few-shot classification is to mimic the few-shot situations faced at test time.
We propose an adaptive task sampling method to improve the generalization performance.
arXiv Detail & Related papers (2020-07-17T03:15:53Z) - Meta-Reinforcement Learning Robust to Distributional Shift via Model
Identification and Experience Relabeling [126.69933134648541]
We present a meta-reinforcement learning algorithm that is both efficient and extrapolates well when faced with out-of-distribution tasks at test time.
Our method is based on a simple insight: we recognize that dynamics models can be adapted efficiently and consistently with off-policy data.
arXiv Detail & Related papers (2020-06-12T13:34:46Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.