Rethinking Invariance in In-context Learning
- URL: http://arxiv.org/abs/2505.04994v1
- Date: Thu, 08 May 2025 06:59:14 GMT
- Title: Rethinking Invariance in In-context Learning
- Authors: Lizhe Fang, Yifei Wang, Khashayar Gatmiry, Lei Fang, Yisen Wang,
- Abstract summary: In-Context Learning (ICL) has emerged as a pivotal capability of auto-regressive large language models.<n>It is hindered by a notable sensitivity to the ordering of context examples regardless of their mutual independence.<n>In this work, we identify two crucial elements in the design of an invariant ICL algorithm.
- Score: 31.27174483063626
- License: http://creativecommons.org/licenses/by-nc-sa/4.0/
- Abstract: In-Context Learning (ICL) has emerged as a pivotal capability of auto-regressive large language models, yet it is hindered by a notable sensitivity to the ordering of context examples regardless of their mutual independence. To address this issue, recent studies have introduced several variant algorithms of ICL that achieve permutation invariance. However, many of these do not exhibit comparable performance with the standard auto-regressive ICL algorithm. In this work, we identify two crucial elements in the design of an invariant ICL algorithm: information non-leakage and context interdependence, which are not simultaneously achieved by any of the existing methods. These investigations lead us to the proposed Invariant ICL (InvICL), a methodology designed to achieve invariance in ICL while ensuring the two properties. Empirically, our findings reveal that InvICL surpasses previous models, both invariant and non-invariant, in most benchmark datasets, showcasing superior generalization capabilities across varying input lengths. Code is available at https://github.com/PKU-ML/InvICL.
Related papers
- Binary Code Similarity Detection via Graph Contrastive Learning on Intermediate Representations [52.34030226129628]
Binary Code Similarity Detection (BCSD) plays a crucial role in numerous fields, including vulnerability detection, malware analysis, and code reuse identification.
In this paper, we propose IRBinDiff, which mitigates compilation differences by leveraging LLVM-IR with higher-level semantic abstraction.
Our extensive experiments, conducted under varied compilation settings, demonstrate that IRBinDiff outperforms other leading BCSD methods in both One-to-one comparison and One-to-many search scenarios.
arXiv Detail & Related papers (2024-10-24T09:09:20Z) - Implicit In-context Learning [37.0562059811099]
We introduce Implicit In-context Learning (I2CL), an innovative paradigm that reduces the inference cost of ICL to that of zero-shot learning with minimal information loss.<n>I2CL achieves few-shot level performance at zero-shot inference cost, and it exhibits robustness against variations in demonstration examples.
arXiv Detail & Related papers (2024-05-23T14:57:52Z) - ParaICL: Towards Parallel In-Context Learning [74.38022919598443]
Large language models (LLMs) have become the norm in natural language processing.<n>Few-shot in-context learning (ICL) relies on the choice of few-shot demonstration examples.<n>We propose a novel method named parallel in-context learning (ParaICL)
arXiv Detail & Related papers (2024-03-31T05:56:15Z) - Batch-ICL: Effective, Efficient, and Order-Agnostic In-Context Learning [27.729189318779603]
Batch-ICL is an effective, efficient, and order-agnostic inference algorithm for in-context learning.
We show that Batch-ICL consistently outperforms most permutations of ICL examples.
We also develop a novel variant of Batch-ICL featuring multiple "epochs" of meta-optimization.
arXiv Detail & Related papers (2024-01-12T09:31:17Z) - Positional Information Matters for Invariant In-Context Learning: A Case
Study of Simple Function Classes [39.08988313527199]
In-context learning (ICL) refers to the ability of a model to condition on a few in-context demonstrations to generate the answer for a new query input.
Despite the impressive ICL ability of LLMs, ICL in LLMs is sensitive to input demonstrations and limited to short context lengths.
arXiv Detail & Related papers (2023-11-30T02:26:55Z) - Transformers as Statisticians: Provable In-Context Learning with
In-Context Algorithm Selection [88.23337313766353]
This work first provides a comprehensive statistical theory for transformers to perform ICL.
We show that transformers can implement a broad class of standard machine learning algorithms in context.
A emphsingle transformer can adaptively select different base ICL algorithms.
arXiv Detail & Related papers (2023-06-07T17:59:31Z) - What and How does In-Context Learning Learn? Bayesian Model Averaging,
Parameterization, and Generalization [111.55277952086155]
We study In-Context Learning (ICL) by addressing several open questions.
We show that, without updating the neural network parameters, ICL implicitly implements the Bayesian model averaging algorithm.
We prove that the error of pretrained model is bounded by a sum of an approximation error and a generalization error.
arXiv Detail & Related papers (2023-05-30T21:23:47Z) - Exploring Complementary Strengths of Invariant and Equivariant
Representations for Few-Shot Learning [96.75889543560497]
In many real-world problems, collecting a large number of labeled samples is infeasible.
Few-shot learning is the dominant approach to address this issue, where the objective is to quickly adapt to novel categories in presence of a limited number of samples.
We propose a novel training mechanism that simultaneously enforces equivariance and invariance to a general set of geometric transformations.
arXiv Detail & Related papers (2021-03-01T21:14:33Z) - Learning Invariant Representations using Inverse Contrastive Loss [34.93395633215398]
We introduce a class of losses for learning representations that are invariant to some extraneous variable of interest.
We show that if the extraneous variable is binary, then optimizing ICL is equivalent to optimizing a regularized MMD divergence.
arXiv Detail & Related papers (2021-02-16T18:29:28Z) - Joint Contrastive Learning with Infinite Possibilities [114.45811348666898]
This paper explores useful modifications of the recent development in contrastive learning via novel probabilistic modeling.
We derive a particular form of contrastive loss named Joint Contrastive Learning (JCL)
arXiv Detail & Related papers (2020-09-30T16:24:21Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.