Related papers: In-context Learning in Presence of Spurious Correlations

In-context Learning in Presence of Spurious Correlations

URL: http://arxiv.org/abs/2410.03140v1
Date: Fri, 4 Oct 2024 04:26:36 GMT
Title: In-context Learning in Presence of Spurious Correlations
Authors: Hrayr Harutyunyan, Rafayel Darbinyan, Samvel Karapetyan, Hrant Khachatrian,
Abstract summary: We study the possibility of training an in-context learner for classification tasks involving spurious features. We find that the conventional approach of training in-context learners is susceptible to spurious features. We propose a novel technique to train such a learner for a given classification task.
Score: 8.055478206164105
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Large language models exhibit a remarkable capacity for in-context learning, where they learn to solve tasks given a few examples. Recent work has shown that transformers can be trained to perform simple regression tasks in-context. This work explores the possibility of training an in-context learner for classification tasks involving spurious features. We find that the conventional approach of training in-context learners is susceptible to spurious features. Moreover, when the meta-training dataset includes instances of only one task, the conventional approach leads to task memorization and fails to produce a model that leverages context for predictions. Based on these observations, we propose a novel technique to train such a learner for a given classification task. Remarkably, this in-context learner matches and sometimes outperforms strong methods like ERM and GroupDRO. However, unlike these algorithms, it does not generalize well to other tasks. We show that it is possible to obtain an in-context learner that generalizes to unseen tasks by training on a diverse dataset of synthetic in-context learning instances.

Related papers

Is Visual in-Context Learning for Compositional Medical Tasks within Reach? [68.56630652862293]
In this paper, we explore the potential of visual in-context learning to enable a single model to handle multiple tasks.<n>We introduce a novel method for training in-context learners using a synthetic compositional task generation engine.
arXiv Detail & Related papers (2025-07-01T15:32:23Z)
Distinct Computations Emerge From Compositional Curricula in In-Context Learning [14.14936155458072]
We investigate how presenting a compositional subtask curriculum in context may alter the computations a transformer learns.<n>We show that models trained with a subtask curriculum can perform zero-shot inference on unseen compositional tasks.<n>We find that the models employ diverse strategies modulated by the specific curriculum design.
arXiv Detail & Related papers (2025-06-16T08:49:42Z)
Provable Benefits of Task-Specific Prompts for In-context Learning [44.768199865867494]
In this work, we consider a novel setting where the global task distribution can be partitioned into a union of conditional task distributions. We then examine the use of task-specific prompts and prediction heads for learning the prior information associated with the conditional task distribution using a one-layer attention model.
arXiv Detail & Related papers (2025-03-03T22:37:03Z)
Learning Task Representations from In-Context Learning [73.72066284711462]
Large language models (LLMs) have demonstrated remarkable proficiency in in-context learning. We introduce an automated formulation for encoding task information in ICL prompts as a function of attention heads. We show that our method's effectiveness stems from aligning the distribution of the last hidden state with that of an optimally performing in-context-learned model.
arXiv Detail & Related papers (2025-02-08T00:16:44Z)
Task Vectors in In-Context Learning: Emergence, Formation, and Benefit [17.72043522825441]
We investigate the formation of task vectors in a controlled setting using models trained from scratch on synthetic datasets. Our findings confirm that task vectors naturally emerge under certain conditions, but the tasks may be relatively weakly and/or non-locally encoded within the model. To promote strong task vectors encoded at a prescribed location within the model, we propose an auxiliary training mechanism based on a task vector prompting loss.
arXiv Detail & Related papers (2025-01-16T01:54:23Z)
EXnet: Efficient In-context Learning for Data-less Text classification [0.0]
We present EXnet, a model specifically designed to perform in-context learning without limitations on the number of examples. We argue that in-context learning is an effective method to increase task accuracy, and providing examples facilitates cross-task generalization. With extensive experiments, we show that even our smallest model (15M parameters) generalizes to several unseen classification tasks and domains.
arXiv Detail & Related papers (2023-05-24T01:40:57Z)
RetICL: Sequential Retrieval of In-Context Examples with Reinforcement Learning [53.52699766206808]
We propose Retrieval for In-Context Learning (RetICL), a learnable method for modeling and optimally selecting examples sequentially for in-context learning. We evaluate RetICL on math word problem solving and scientific question answering tasks and show that it consistently outperforms or matches and learnable baselines.
arXiv Detail & Related papers (2023-05-23T20:15:56Z)
The Learnability of In-Context Learning [16.182561312622315]
We propose a first-of-its-kind PAC based framework for in-context learnability. Our framework includes an initial pretraining phase, which fits a function to the pretraining distribution. We show that in-context learning is more about identifying the task than about learning it.
arXiv Detail & Related papers (2023-03-14T13:28:39Z)
What Can Transformers Learn In-Context? A Case Study of Simple Function Classes [67.06980111346245]
In-context learning refers to the ability of a model to condition on a prompt sequence consisting of in-context examples. We show that standard Transformers can be trained from scratch to perform in-context learning of linear functions. We also show that we can train Transformers to in-context learn more complex function classes with performance that matches or exceeds task-specific learning algorithms.
arXiv Detail & Related papers (2022-08-01T18:01:40Z)
Meta-Learning to Compositionally Generalize [34.656819307701156]
We implement a meta-learning augmented version of supervised learning. We construct pairs of tasks for meta-learning by sub-sampling existing training data. Experimental results on the COGS and SCAN datasets show that our similarity-driven meta-learning can improve generalization performance.
arXiv Detail & Related papers (2021-06-08T11:21:48Z)
Conditional Meta-Learning of Linear Representations [57.90025697492041]
Standard meta-learning for representation learning aims to find a common representation to be shared across multiple tasks. In this work we overcome this issue by inferring a conditioning function, mapping the tasks' side information into a representation tailored to the task at hand. We propose a meta-algorithm capable of leveraging this advantage in practice.
arXiv Detail & Related papers (2021-03-30T12:02:14Z)
Curriculum Learning: A Survey [65.31516318260759]
Curriculum learning strategies have been successfully employed in all areas of machine learning. We construct a taxonomy of curriculum learning approaches by hand, considering various classification criteria. We build a hierarchical tree of curriculum learning methods using an agglomerative clustering algorithm.
arXiv Detail & Related papers (2021-01-25T20:08:32Z)
Adaptive Task Sampling for Meta-Learning [79.61146834134459]
Key idea of meta-learning for few-shot classification is to mimic the few-shot situations faced at test time. We propose an adaptive task sampling method to improve the generalization performance.
arXiv Detail & Related papers (2020-07-17T03:15:53Z)
Meta-Reinforcement Learning Robust to Distributional Shift via Model Identification and Experience Relabeling [126.69933134648541]
We present a meta-reinforcement learning algorithm that is both efficient and extrapolates well when faced with out-of-distribution tasks at test time. Our method is based on a simple insight: we recognize that dynamics models can be adapted efficiently and consistently with off-policy data.
arXiv Detail & Related papers (2020-06-12T13:34:46Z)

This list is automatically generated from the titles and abstracts of the papers in this site.