Towards Compute-Optimal Many-Shot In-Context Learning
- URL: http://arxiv.org/abs/2507.16217v1
- Date: Tue, 22 Jul 2025 04:21:03 GMT
- Title: Towards Compute-Optimal Many-Shot In-Context Learning
- Authors: Shahriar Golchin, Yanfei Chen, Rujun Han, Manan Gandhi, Tianli Yu, Swaroop Mishra, Mihai Surdeanu, Rishabh Agarwal, Chen-Yu Lee, Tomas Pfister,
- Abstract summary: We propose two strategies for demonstration selection in many-shot ICL.<n>First method combines a small number of demonstrations, selected based on similarity to each test sample, with a disproportionately larger set of random demonstrations that are cached.<n>Second strategy improves the first by replacing random demonstrations with those selected using centroids derived from test sample representations via k-means clustering.
- Score: 63.815463719071055
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Long-context large language models (LLMs) are able to process inputs containing up to several million tokens. In the scope of in-context learning (ICL), this translates into using hundreds/thousands of demonstrations in the input prompt, enabling many-shot ICL. In practice, a fixed set of demonstrations is often selected at random in many-shot settings due to (1) high inference costs, (2) the benefits of caching and reusing computations, and (3) the similar performance offered by this strategy compared to others when scaled. In this work, we propose two straightforward strategies for demonstration selection in many-shot ICL that improve performance with minimal computational overhead. Our first method combines a small number of demonstrations, selected based on their similarity to each test sample, with a disproportionately larger set of random demonstrations that are cached. The second strategy improves the first by replacing random demonstrations with those selected using centroids derived from test sample representations via k-means clustering. Our experiments with Gemini Pro and Flash across several datasets indicate that our strategies consistently outperform random selection and surpass or match the most performant selection approach while supporting caching and reducing inference cost by up to an order of magnitude. We also show that adjusting the proportion of demonstrations selected based on different criteria can balance performance and inference cost in many-shot ICL.
Related papers
- Large Language Models are Demonstration Pre-Selectors for Themselves [57.101804269100185]
In-context learning (ICL) with large language models (LLMs) delivers strong few-shot performance by choosing few-shot demonstrations from the entire training data.<n>FEw yet Essential Demonstration prE-selectoR is a novel pre-selection framework that identifies a representative subset of demonstrations.<n>FEw yet Essential Demonstration prE-selectoR can reduce training data size by over 20% while maintaining performance.
arXiv Detail & Related papers (2025-06-06T12:29:03Z) - Selecting Demonstrations for Many-Shot In-Context Learning via Gradient Matching [24.4195026869735]
In-Context Learning (ICL) empowers Large Language Models (LLMs) for rapid task adaptation without Fine-Tuning (FT)<n>While many-shot ICL shows promising performance through scaled demonstrations, the selection method for many-shot demonstrations remains limited to random selection in existing work.<n>We introduce a novel gradient matching approach that selects demonstrations by aligning fine-tuning gradients between the entire training set of the target task and the selected examples, so as to approach the learning effect on the entire training set within the selected examples.
arXiv Detail & Related papers (2025-06-05T02:57:05Z) - ParaICL: Towards Parallel In-Context Learning [74.38022919598443]
Large language models (LLMs) have become the norm in natural language processing.<n>Few-shot in-context learning (ICL) relies on the choice of few-shot demonstration examples.<n>We propose a novel method named parallel in-context learning (ParaICL)
arXiv Detail & Related papers (2024-03-31T05:56:15Z) - $Se^2$: Sequential Example Selection for In-Context Learning [83.17038582333716]
Large language models (LLMs) for in-context learning (ICL) need to be activated by demonstration examples.
Prior work has extensively explored the selection of examples for ICL, predominantly following the "select then organize" paradigm.
In this paper, we formulate the problem as a $Se$quential $Se$lection problem and introduce $Se2$, a sequential-aware method.
arXiv Detail & Related papers (2024-02-21T15:35:04Z) - In-Context Learning with Iterative Demonstration Selection [32.62104857810135]
Large language models (LLMs) have demonstrated strong few-shot learning ability via in-context learning (ICL)<n>The performance of ICL has been shown to be highly sensitive to the selection of few-shot demonstrations.<n>We propose Iterative Demonstration Selection (IDS) to leverage the merits of both dimensions.
arXiv Detail & Related papers (2023-10-15T16:40:19Z) - In-Context Demonstration Selection with Cross Entropy Difference [95.21947716378641]
Large language models (LLMs) can use in-context demonstrations to improve performance on zero-shot tasks.
We present a cross-entropy difference (CED) method for selecting in-context demonstrations.
arXiv Detail & Related papers (2023-05-24T05:04:00Z) - Active Learning Principles for In-Context Learning with Large Language
Models [65.09970281795769]
This paper investigates how Active Learning algorithms can serve as effective demonstration selection methods for in-context learning.
We show that in-context example selection through AL prioritizes high-quality examples that exhibit low uncertainty and bear similarity to the test examples.
arXiv Detail & Related papers (2023-05-23T17:16:04Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.