Related papers: In-Context Learning with Long-Context Models: An In-Depth Exploration

In-Context Learning with Long-Context Models: An In-Depth Exploration

URL: http://arxiv.org/abs/2405.00200v1
Date: Tue, 30 Apr 2024 21:06:52 GMT
Title: In-Context Learning with Long-Context Models: An In-Depth Exploration
Authors: Amanda Bertsch, Maor Ivgi, Uri Alon, Jonathan Berant, Matthew R. Gormley, Graham Neubig,
Abstract summary: We show that, for many datasets with large label spaces, performance continues to increase with hundreds or thousands of demonstrations. We show that although long-context ICL can be surprisingly effective, most of this gain comes from attending back to similar examples.
Score: 96.1389740719691
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: As model context lengths continue to increase, the number of demonstrations that can be provided in-context approaches the size of entire training datasets. We study the behavior of in-context learning (ICL) at this extreme scale on multiple datasets and models. We show that, for many datasets with large label spaces, performance continues to increase with hundreds or thousands of demonstrations. We contrast this with example retrieval and finetuning: example retrieval shows excellent performance at low context lengths but has diminished gains with more demonstrations; finetuning is more data hungry than ICL but can sometimes exceed long-context ICL performance with additional data. We use this ICL setting as a testbed to study several properties of both in-context learning and long-context models. We show that long-context ICL is less sensitive to random input shuffling than short-context ICL, that grouping of same-label examples can negatively impact performance, and that the performance boosts we see do not arise from cumulative gain from encoding many examples together. We conclude that although long-context ICL can be surprisingly effective, most of this gain comes from attending back to similar examples rather than task learning.

Related papers

Refract ICL: Rethinking Example Selection in the Era of Million-Token Models [31.1838001692089]
Long-context large language models (LLMs) have enabled the use of hundreds, or even thousands, of demonstrations for in-context learning (ICL)<n>This paper investigates whether traditional ICL selection strategies, which balance the similarity of ICL examples to the test input, remain effective when utilizing a large number of demonstrations.<n>We introduce Refract ICL, a novel ICL selection algorithm specifically designed to focus LLM attention on challenging examples.
arXiv Detail & Related papers (2025-06-14T04:51:34Z)
MateICL: Mitigating Attention Dispersion in Large-Scale In-Context Learning [0.0]
We introduce Mitigating Attention Dispersion in large-scale ICL (MateICL)<n>We show that MateICL can effectively leverage larger contexts to improve ICL performance.<n>Despite advances in inference strategies, our results demonstrate that MateICL remains beneficial in computationally resource-constrained settings.
arXiv Detail & Related papers (2025-05-02T08:45:45Z)
On Many-Shot In-Context Learning for Long-Context Evaluation [10.500629810624769]
This paper delves into long-context language model evaluation through many-shot ICL. We develop metrics to categorize ICL tasks into two groups: similar-sample learning (SSL) and all-sample learning (ASL) We find that while state-of-the-art models demonstrate good performance up to 64k tokens in SSL tasks, many models experience significant performance drops at only 16k tokens in ASL tasks.
arXiv Detail & Related papers (2024-11-11T17:00:59Z)
Deeper Insights Without Updates: The Power of In-Context Learning Over Fine-Tuning [22.341935761925892]
Fine-tuning and in-context learning (ICL) are two prevalent methods in imbuing large language models with task-specific knowledge. This paper presents a counterintuitive finding: For tasks with implicit patterns, ICL captures these patterns significantly better than fine-tuning.
arXiv Detail & Related papers (2024-10-07T02:12:22Z)
How to Train Long-Context Language Models (Effectively) [75.5418485597276]
We study continued training and supervised fine-tuning (SFT) of a language model (LM) to make effective use of long-context information. We find that code repositories and books are excellent sources of long data, but it is crucial to combine them with high-quality short-context data. Our final model, ProLong-8B, demonstrates state-of-the-art long-context performance among similarly sized models at a length of 128K.
arXiv Detail & Related papers (2024-10-03T16:46:52Z)
Untie the Knots: An Efficient Data Augmentation Strategy for Long-Context Pre-Training in Language Models [21.90388980448712]
Training models to handle long contexts presents significant challenges. We introduce Untie the Knots (textbfUtK), a novel data augmentation strategy employed during the continue pre-training phase. We conduct extensive experiments on models with 7B and 72B parameters, trained on 20 billion tokens, demonstrating that UtK achieves 75% and 84.5% accurracy on RULER at 128K context length.
arXiv Detail & Related papers (2024-09-07T09:28:55Z)
LongSkywork: A Training Recipe for Efficiently Extending Context Length in Large Language Models [61.12177317970258]
LongSkywork is a long-context Large Language Model capable of processing up to 200,000 tokens. We develop two novel methods for creating synthetic data. LongSkywork achieves outstanding performance on a variety of long-context benchmarks.
arXiv Detail & Related papers (2024-06-02T03:34:41Z)
Many-Shot In-Context Learning [58.395589302800566]
Large language models (LLMs) excel at few-shot in-context learning (ICL) We observe significant performance gains across a wide variety of generative and discriminative tasks. Unlike few-shot learning, many-shot learning is effective at overriding pretraining biases.
arXiv Detail & Related papers (2024-04-17T02:49:26Z)
ParaICL: Towards Robust Parallel In-Context Learning [74.38022919598443]
Large language models (LLMs) have become the norm in natural language processing. Few-shot in-context learning (ICL) relies on the choice of few-shot demonstration examples. We propose a novel method named parallel in-context learning (ParaICL)
arXiv Detail & Related papers (2024-03-31T05:56:15Z)
Not All Demonstration Examples are Equally Beneficial: Reweighting Demonstration Examples for In-Context Learning [32.29118942982609]
Large Language Models (LLMs) have recently gained the In-Context Learning (ICL) ability with the models scaling up. This paper investigates how to determine approximately optimal weights for demonstration examples and how to apply them during ICL. Experimental results on 8 text classification tasks show that our approach outperforms conventional ICL by a large margin.
arXiv Detail & Related papers (2023-10-12T13:15:11Z)
Dynamic Demonstrations Controller for In-Context Learning [51.3439660534631]
In-Context Learning (ICL) is a new paradigm for natural language processing (NLP), where a large language model observes a small number of demonstrations and a test instance as its input. Previous studies have revealed that ICL is sensitive to the selection and the ordering of demonstrations. We propose a Dynamic Demonstrations Controller (D$2$Controller), which can improve the ICL performance by adjusting the number of demonstrations.
arXiv Detail & Related papers (2023-09-30T14:04:22Z)
Effective Long-Context Scaling of Foundation Models [90.57254298730923]
We present a series of long-context LLMs that support effective context windows of up to 32,768 tokens. Our models achieve consistent improvements on most regular tasks and significant improvements on long-context tasks over Llama 2.
arXiv Detail & Related papers (2023-09-27T21:41:49Z)
In-Context Learning Learns Label Relationships but Is Not Conventional Learning [60.891931501449726]
There is currently no consensus about how in-context learning (ICL) ability of Large Language Models works. We provide novel insights into how ICL leverages label information, revealing both capabilities and limitations. Our experiments show that ICL predictions almost always depend on in-context labels and that ICL can learn truly novel tasks in-context.
arXiv Detail & Related papers (2023-07-23T16:54:41Z)
Understanding In-Context Learning via Supportive Pretraining Data [55.648777340129364]
In-context learning (ICL) improves language models' performance on a variety of NLP tasks by simply demonstrating a handful of examples at inference time. It is not well understood why ICL ability emerges, as the model has never been specifically trained on such demonstrations. Our work takes a first step towards understanding ICL via analyzing instance-level pretraining data.
arXiv Detail & Related papers (2023-06-26T22:14:04Z)

This list is automatically generated from the titles and abstracts of the papers in this site.