In-Context Learning with Long-Context Models: An In-Depth Exploration
- URL: http://arxiv.org/abs/2405.00200v1
- Date: Tue, 30 Apr 2024 21:06:52 GMT
- Title: In-Context Learning with Long-Context Models: An In-Depth Exploration
- Authors: Amanda Bertsch, Maor Ivgi, Uri Alon, Jonathan Berant, Matthew R. Gormley, Graham Neubig,
- Abstract summary: We show that, for many datasets with large label spaces, performance continues to increase with hundreds or thousands of demonstrations.
We show that although long-context ICL can be surprisingly effective, most of this gain comes from attending back to similar examples.
- Score: 96.1389740719691
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: As model context lengths continue to increase, the number of demonstrations that can be provided in-context approaches the size of entire training datasets. We study the behavior of in-context learning (ICL) at this extreme scale on multiple datasets and models. We show that, for many datasets with large label spaces, performance continues to increase with hundreds or thousands of demonstrations. We contrast this with example retrieval and finetuning: example retrieval shows excellent performance at low context lengths but has diminished gains with more demonstrations; finetuning is more data hungry than ICL but can sometimes exceed long-context ICL performance with additional data. We use this ICL setting as a testbed to study several properties of both in-context learning and long-context models. We show that long-context ICL is less sensitive to random input shuffling than short-context ICL, that grouping of same-label examples can negatively impact performance, and that the performance boosts we see do not arise from cumulative gain from encoding many examples together. We conclude that although long-context ICL can be surprisingly effective, most of this gain comes from attending back to similar examples rather than task learning.
Related papers
- What Matters for In-Context Learning: A Balancing Act of Look-up and In-Weight Learning [42.8453045943264]
We show that conceptual repetitions in the data sequences are crucial for ICL.
We also show that the emergence of ICL depends on balancing the in-weight learning objective with the in-context solving ability.
arXiv Detail & Related papers (2025-01-09T09:45:05Z) - Revisiting In-Context Learning with Long Context Language Models [26.141121450077637]
In-Context Learning (ICL) is a technique by which language models make predictions based on examples provided in their input context.
The advent of Long Context Language Models (LCLMs) has significantly increased the number of examples that can be included in context.
We revisit these approaches in the context of LCLMs through extensive experiments on 18 datasets spanning 4 tasks.
arXiv Detail & Related papers (2024-12-22T08:55:19Z) - Many-Shot In-Context Learning [58.395589302800566]
Large language models (LLMs) excel at few-shot in-context learning (ICL)
We observe significant performance gains across a wide variety of generative and discriminative tasks.
Unlike few-shot learning, many-shot learning is effective at overriding pretraining biases.
arXiv Detail & Related papers (2024-04-17T02:49:26Z) - ParaICL: Towards Robust Parallel In-Context Learning [74.38022919598443]
Large language models (LLMs) have become the norm in natural language processing.
Few-shot in-context learning (ICL) relies on the choice of few-shot demonstration examples.
We propose a novel method named parallel in-context learning (ParaICL)
arXiv Detail & Related papers (2024-03-31T05:56:15Z) - Not All Demonstration Examples are Equally Beneficial: Reweighting
Demonstration Examples for In-Context Learning [32.29118942982609]
Large Language Models (LLMs) have recently gained the In-Context Learning (ICL) ability with the models scaling up.
This paper investigates how to determine approximately optimal weights for demonstration examples and how to apply them during ICL.
Experimental results on 8 text classification tasks show that our approach outperforms conventional ICL by a large margin.
arXiv Detail & Related papers (2023-10-12T13:15:11Z) - Dynamic Demonstrations Controller for In-Context Learning [48.455265597575675]
In-context learning (ICL) is a new paradigm for natural language processing (NLP)
It is commonly believed that the number of demonstrations is positively correlated with model performance.
We propose a Dynamic Demonstrations Controller (D$2$Controller) which can improve the ICL performance by adjusting the number of demonstrations.
arXiv Detail & Related papers (2023-09-30T14:04:22Z) - Effective Long-Context Scaling of Foundation Models [90.57254298730923]
We present a series of long-context LLMs that support effective context windows of up to 32,768 tokens.
Our models achieve consistent improvements on most regular tasks and significant improvements on long-context tasks over Llama 2.
arXiv Detail & Related papers (2023-09-27T21:41:49Z) - In-Context Learning Learns Label Relationships but Is Not Conventional
Learning [60.891931501449726]
There is currently no consensus about how in-context learning (ICL) ability of Large Language Models works.
We provide novel insights into how ICL leverages label information, revealing both capabilities and limitations.
Our experiments show that ICL predictions almost always depend on in-context labels and that ICL can learn truly novel tasks in-context.
arXiv Detail & Related papers (2023-07-23T16:54:41Z) - Understanding In-Context Learning via Supportive Pretraining Data [55.648777340129364]
In-context learning (ICL) improves language models' performance on a variety of NLP tasks by simply demonstrating a handful of examples at inference time.
It is not well understood why ICL ability emerges, as the model has never been specifically trained on such demonstrations.
Our work takes a first step towards understanding ICL via analyzing instance-level pretraining data.
arXiv Detail & Related papers (2023-06-26T22:14:04Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.