Related papers: Many-Shot In-Context Learning

Many-Shot In-Context Learning

URL: http://arxiv.org/abs/2404.11018v3
Date: Thu, 17 Oct 2024 17:45:09 GMT
Title: Many-Shot In-Context Learning
Authors: Rishabh Agarwal, Avi Singh, Lei M. Zhang, Bernd Bohnet, Luis Rosias, Stephanie Chan, Biao Zhang, Ankesh Anand, Zaheer Abbas, Azade Nova, John D. Co-Reyes, Eric Chu, Feryal Behbahani, Aleksandra Faust, Hugo Larochelle,
Abstract summary: Large language models (LLMs) excel at few-shot in-context learning (ICL) We observe significant performance gains across a wide variety of generative and discriminative tasks. Unlike few-shot learning, many-shot learning is effective at overriding pretraining biases.
Score: 58.395589302800566
License: http://creativecommons.org/licenses/by/4.0/
Abstract: Large language models (LLMs) excel at few-shot in-context learning (ICL) -- learning from a few examples provided in context at inference, without any weight updates. Newly expanded context windows allow us to investigate ICL with hundreds or thousands of examples -- the many-shot regime. Going from few-shot to many-shot, we observe significant performance gains across a wide variety of generative and discriminative tasks. While promising, many-shot ICL can be bottlenecked by the available amount of human-generated examples. To mitigate this limitation, we explore two new settings: Reinforced and Unsupervised ICL. Reinforced ICL uses model-generated chain-of-thought rationales in place of human examples. Unsupervised ICL removes rationales from the prompt altogether, and prompts the model only with domain-specific questions. We find that both Reinforced and Unsupervised ICL can be quite effective in the many-shot regime, particularly on complex reasoning tasks. Finally, we demonstrate that, unlike few-shot learning, many-shot learning is effective at overriding pretraining biases, can learn high-dimensional functions with numerical inputs, and performs comparably to fine-tuning. We also find that inference cost increases linearly in the many-shot regime, and frontier LLMs benefit from many-shot ICL to varying degrees. Our analysis also reveals the limitations of next-token prediction loss as an indicator of downstream ICL performance.

Related papers

You Only Fine-tune Once: Many-Shot In-Context Fine-Tuning for Large Language Model [5.680203508724697]
Large language models (LLMs) possess a remarkable ability to perform in-context learning (ICL)<n>Many-Shot In-Context Fine-tuning (ManyICL) significantly narrows this performance gap by extending the principles of ICL to a many-shot setting.
arXiv Detail & Related papers (2025-06-06T19:36:04Z)
Efficient Many-Shot In-Context Learning with Dynamic Block-Sparse Attention [45.20728476185864]
Many-shot in-context learning has recently shown promise as an alternative to finetuning. This shifts the computational burden from training-time to inference-time. We present Dynamic Block-Sparse Attention, a training-free framework for retrieval-based many-shot in-context learning.
arXiv Detail & Related papers (2025-03-11T17:30:58Z)
From Few to Many: Self-Improving Many-Shot Reasoners Through Iterative Optimization and Generation [18.988069926846357]
Many-shot in-context learning (ICL) can lead to performance benefits, but it is unclear what aspects dominate the benefits and whether simply scaling to more examples is the most effective way of improving ICL. We propose BRIDGE, an algorithm that alternates between the optimize step with Bayesian optimization to discover the influential sets of examples and the generate step to reuse this set to expand the reasoning paths of the examples back to the many-shot regime automatically. On Gemini, Claude, and Mistral LLMs of different sizes, we show that BRIDGE to significant improvements across a diverse set of tasks, including
arXiv Detail & Related papers (2025-02-01T06:23:24Z)
More is not always better? Enhancing Many-Shot In-Context Learning with Differentiated and Reweighting Objectives [50.772462704559345]
We introduce DrICL, a novel optimization method that enhances model performance through Differentiated Learning and advantage-based Reweighting objectives. Globally, DrICL utilizes differentiated learning to optimize the NLL objective, ensuring that many-shot performance surpasses zero-shot levels. We develop the Many-Shot ICL Benchmark (ICL-50)-a large-scale benchmark of 50 tasks that cover shot numbers from 1 to 350 within sequences of up to 8,000 tokens-for fine-tuning purposes.
arXiv Detail & Related papers (2025-01-07T14:57:08Z)
Context-aware Prompt Tuning: Advancing In-Context Learning with Adversarial Methods [69.36397993451742]
This work introduces Context-aware Prompt Tuning (CPT), a method inspired by ICL, PT, and adversarial attacks. We modify specific context tokens, considering the unique structure of input and output formats. Inspired by adversarial attacks, we adjust the input based on the labels present in the context, focusing on minimizing, rather than maximizing, the loss.
arXiv Detail & Related papers (2024-10-22T17:45:47Z)
Divide, Reweight, and Conquer: A Logit Arithmetic Approach for In-Context Learning [19.16587730306472]
In-Context Learning (ICL) emerges as a key feature for Large Language Models (LLMs) We propose Logit Arithmetic Reweighting Approach (LARA), a novel framework that enhances ICL by using logit-based ensembling of multiple demonstrations.
arXiv Detail & Related papers (2024-10-14T01:34:16Z)
Implicit In-context Learning [37.0562059811099]
In-context Learning (ICL) empowers large language models to adapt to unseen tasks during inference by prefixing a few demonstration examples prior to test queries. We introduce Implicit In-context Learning (I2CL), an innovative paradigm that addresses the challenges associated with traditional ICL by absorbing demonstration examples within the activation space. I2CL achieves few-shot performance with zero-shot cost and exhibits robustness against the variation of demonstration examples.
arXiv Detail & Related papers (2024-05-23T14:57:52Z)
"In-Context Learning" or: How I learned to stop worrying and love "Applied Information Retrieval" [9.264121218481133]
In-context learning (ICL) has evolved as a new paradigm for natural language processing (NLP) ICL is conceptually similar to a non-parametric approach, such as $k$-NN. Similar examples in ICL retrieved from a training set relate to a set of documents retrieved from a collection in IR.
arXiv Detail & Related papers (2024-05-02T09:25:24Z)
In-Context Learning with Long-Context Models: An In-Depth Exploration [96.1389740719691]
We show that, for many datasets with large label spaces, performance continues to increase with hundreds or thousands of demonstrations. We show that although long-context ICL can be surprisingly effective, most of this gain comes from attending back to similar examples.
arXiv Detail & Related papers (2024-04-30T21:06:52Z)
Does In-Context Learning Really Learn? Rethinking How Large Language Models Respond and Solve Tasks via In-Context Learning [41.606494950216764]
In-context Learning (ICL) has emerged as a powerful capability alongside the development of scaled-up large language models (LLMs) This paper decomposes the overall performance of ICL into three dimensions, label space, format, and discrimination. We show that ICL exhibits significant efficacy in regulating the label space and format, which helps LLMs respond to desired label words.
arXiv Detail & Related papers (2024-04-11T08:20:10Z)
ParaICL: Towards Robust Parallel In-Context Learning [74.38022919598443]
Large language models (LLMs) have become the norm in natural language processing. Few-shot in-context learning (ICL) relies on the choice of few-shot demonstration examples. We propose a novel method named parallel in-context learning (ParaICL)
arXiv Detail & Related papers (2024-03-31T05:56:15Z)
In-Context Learning Functions with Varying Number of Minima [3.3268674937926224]
We propose a new task of approximating functions with varying number of minima. We find that increasing the number of minima degrades ICL performance. At the same time, our evaluation shows that ICL outperforms 2-layer Neural Network (2NN) model.
arXiv Detail & Related papers (2023-11-21T11:33:03Z)
Structured Prompting: Scaling In-Context Learning to 1,000 Examples [78.41281805608081]
We introduce structured prompting that breaks the length limit and scales in-context learning to thousands of examples. Specifically, demonstration examples are separately encoded with well-designed position embeddings, and then they are jointly attended by the test example using a rescaled attention mechanism.
arXiv Detail & Related papers (2022-12-13T16:31:21Z)
Contrastive Learning with Adversarial Examples [79.39156814887133]
Contrastive learning (CL) is a popular technique for self-supervised learning (SSL) of visual representations. This paper introduces a new family of adversarial examples for constrastive learning and using these examples to define a new adversarial training algorithm for SSL, denoted as CLAE.
arXiv Detail & Related papers (2020-10-22T20:45:10Z)

This list is automatically generated from the titles and abstracts of the papers in this site.