In-Context Learning Learns Label Relationships but Is Not Conventional
Learning
- URL: http://arxiv.org/abs/2307.12375v4
- Date: Wed, 13 Mar 2024 15:00:20 GMT
- Title: In-Context Learning Learns Label Relationships but Is Not Conventional
Learning
- Authors: Jannik Kossen, Yarin Gal, Tom Rainforth
- Abstract summary: There is currently no consensus about how in-context learning (ICL) ability of Large Language Models works.
We provide novel insights into how ICL leverages label information, revealing both capabilities and limitations.
Our experiments show that ICL predictions almost always depend on in-context labels and that ICL can learn truly novel tasks in-context.
- Score: 60.891931501449726
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: The predictions of Large Language Models (LLMs) on downstream tasks often
improve significantly when including examples of the input--label relationship
in the context. However, there is currently no consensus about how this
in-context learning (ICL) ability of LLMs works. For example, while Xie et al.
(2021) liken ICL to a general-purpose learning algorithm, Min et al. (2022)
argue ICL does not even learn label relationships from in-context examples. In
this paper, we provide novel insights into how ICL leverages label information,
revealing both capabilities and limitations. To ensure we obtain a
comprehensive picture of ICL behavior, we study probabilistic aspects of ICL
predictions and thoroughly examine the dynamics of ICL as more examples are
provided. Our experiments show that ICL predictions almost always depend on
in-context labels and that ICL can learn truly novel tasks in-context. However,
we also find that ICL struggles to fully overcome prediction preferences
acquired from pre-training data and, further, that ICL does not consider all
in-context information equally.
Related papers
- Multimodal Contrastive In-Context Learning [0.9120312014267044]
This paper introduces a novel multimodal contrastive in-context learning framework to enhance our understanding of gradient-free in-context learning (ICL) in Large Language Models (LLMs)
First, we present a contrastive learning-based interpretation of ICL in real-world settings, marking the distance of the key-value representation as the differentiator in ICL.
Second, we develop an analytical framework to address biases in multimodal input formatting for real-world datasets.
Third, we propose an on-the-fly approach for ICL that demonstrates effectiveness in detecting hateful memes.
arXiv Detail & Related papers (2024-08-23T10:10:01Z) - ICLEval: Evaluating In-Context Learning Ability of Large Language Models [68.7494310749199]
In-Context Learning (ICL) is a critical capability of Large Language Models (LLMs) as it empowers them to comprehend and reason across interconnected inputs.
Existing evaluation frameworks primarily focus on language abilities and knowledge, often overlooking the assessment of ICL ability.
We introduce the ICLEval benchmark to evaluate the ICL abilities of LLMs, which encompasses two key sub-abilities: exact copying and rule learning.
arXiv Detail & Related papers (2024-06-21T08:06:10Z) - Investigating the Pre-Training Dynamics of In-Context Learning: Task Recognition vs. Task Learning [99.05401042153214]
In-context learning (ICL) is potentially attributed to two major abilities: task recognition (TR) and task learning (TL)
We take the first step by examining the pre-training dynamics of the emergence of ICL.
We propose a simple yet effective method to better integrate these two abilities for ICL at inference time.
arXiv Detail & Related papers (2024-06-20T06:37:47Z) - In-Context Learning with Long-Context Models: An In-Depth Exploration [96.1389740719691]
We show that, for many datasets with large label spaces, performance continues to increase with hundreds or thousands of demonstrations.
We show that although long-context ICL can be surprisingly effective, most of this gain comes from attending back to similar examples.
arXiv Detail & Related papers (2024-04-30T21:06:52Z) - In-Context Exemplars as Clues to Retrieving from Large Associative
Memory [1.2952137350423816]
In-context learning (ICL) enables large language models (LLMs) to learn patterns from in-context exemplars without training.
How to choose exemplars remains unclear due to the lack of understanding of how in-context learning works.
Our study sheds new light on the mechanism of ICL by connecting it to memory retrieval.
arXiv Detail & Related papers (2023-11-06T20:13:29Z) - Investigating the Learning Behaviour of In-context Learning: A
Comparison with Supervised Learning [67.25698169440818]
Large language models (LLMs) have shown remarkable capacity for in-context learning (ICL)
We train the same LLMs with the same demonstration examples via ICL and supervised learning (SL), respectively, and investigate their performance under label perturbations.
First, we find that gold labels have significant impacts on the downstream in-context performance, especially for large language models.
Second, when comparing with SL, we show empirically that ICL is less sensitive to label perturbations than SL, and ICL gradually attains comparable performance to SL as the model size increases.
arXiv Detail & Related papers (2023-07-28T09:03:19Z) - A Survey on In-context Learning [77.78614055956365]
In-context learning (ICL) has emerged as a new paradigm for natural language processing (NLP)
We first present a formal definition of ICL and clarify its correlation to related studies.
We then organize and discuss advanced techniques, including training strategies, prompt designing strategies, and related analysis.
arXiv Detail & Related papers (2022-12-31T15:57:09Z) - Weakly Supervised Continual Learning [17.90483695137098]
This work explores Weakly Supervised Continual Learning (WSCL)
We show that not only our proposals exhibit higher flexibility when supervised information is scarce, but also that less than 25% labels can be enough to reach or even outperform SOTA methods trained under full supervision.
In doing so, we show that not only our proposals exhibit higher flexibility when supervised information is scarce, but also that less than 25% labels can be enough to reach or even outperform SOTA methods trained under full supervision.
arXiv Detail & Related papers (2021-08-14T14:38:20Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.