Scaling Laws for Many-Shot In-Context Learning with Self-Generated Annotations
- URL: http://arxiv.org/abs/2503.03062v2
- Date: Tue, 20 May 2025 22:36:09 GMT
- Title: Scaling Laws for Many-Shot In-Context Learning with Self-Generated Annotations
- Authors: Zhengyao Gu, Henry Peng Zou, Yankai Chen, Aiwei Liu, Weizhi Zhang, Philip S. Yu,
- Abstract summary: We study in-context learning with self-generated examples using a framework analogous to traditional semi-supervised learning.<n>Within this framework, we propose a simple baseline that outperforms ground-truth ICL in zero-shot, few-shot, and many-shot settings.<n>We introduce IterPSD, an iterative annotation approach that integrates iterative refinement and curriculum pseudo-labeling techniques.
- Score: 37.62305582749307
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: The high cost of obtaining high-quality annotated data for in-context learning (ICL) has motivated the development of methods that use self-generated annotations in place of ground-truth labels. While these approaches have shown promising results in few-shot settings, they generally do not scale to many-shot scenarios. In this work, we study ICL with self-generated examples using a framework analogous to traditional semi-supervised learning, consisting of annotation generation, demonstration selection, and in-context inference. Within this framework, we propose a simple baseline that outperforms ground-truth ICL in zero-shot, few-shot, and many-shot settings. Notably, we observe a scaling law with this baseline, where optimal performance is achieved with more than 1,000 demonstrations. To fully exploit the many-shot capabilities of semi-supervised ICL, we introduce IterPSD, an iterative annotation approach that integrates iterative refinement and curriculum pseudo-labeling techniques from semi-supervised learning, yielding up to 6.8% additional gains on classification tasks.
Related papers
- Efficient Many-Shot In-Context Learning with Dynamic Block-Sparse Attention [45.20728476185864]
Many-shot in-context learning has recently shown promise as an alternative to finetuning.<n>This shifts the computational burden from training-time to inference-time.<n>We present Dynamic Block-Sparse Attention, a training-free framework for retrieval-based many-shot in-context learning.
arXiv Detail & Related papers (2025-03-11T17:30:58Z) - Enhancing Input-Label Mapping in In-Context Learning with Contrastive Decoding [71.01099784480597]
Large language models (LLMs) excel at a range of tasks through in-context learning (ICL)<n>We introduce In-Context Contrastive Decoding (ICCD), a novel method that emphasizes input-label mapping.
arXiv Detail & Related papers (2025-02-19T14:04:46Z) - SelfPrompt: Confidence-Aware Semi-Supervised Tuning for Robust Vision-Language Model Adaptation [23.4909421082857]
SelfPrompt is a novel prompt-tuning approach for vision-language models (VLMs) in a semi-supervised learning setup.<n>We introduce a cluster-guided pseudo-labelling method that improves pseudo-label accuracy.<n>We also present a confidence-aware semi-supervised learning module that maximizes the utilization of unlabelled data.
arXiv Detail & Related papers (2025-01-24T00:31:01Z) - PICLe: Pseudo-Annotations for In-Context Learning in Low-Resource Named Entity Detection [56.916656013563355]
In-context learning (ICL) enables Large Language Models to perform tasks using few demonstrations.<n>We propose PICLe, a framework for in-context learning with noisy, pseudo-annotated demonstrations.<n>We evaluate PICLe on five biomedical NED datasets and show that, with zero human annotation, PICLe outperforms ICL in low-resource settings.
arXiv Detail & Related papers (2024-12-16T16:09:35Z) - Class Balance Matters to Active Class-Incremental Learning [61.11786214164405]
We aim to start from a pool of large-scale unlabeled data and then annotate the most informative samples for incremental learning.<n>We propose Class-Balanced Selection (CBS) strategy to achieve both class balance and informativeness in chosen samples.<n>Our CBS can be plugged and played into those CIL methods which are based on pretrained models with prompts tunning technique.
arXiv Detail & Related papers (2024-12-09T16:37:27Z) - Words Matter: Leveraging Individual Text Embeddings for Code Generation in CLIP Test-Time Adaptation [21.20806568508201]
We show how to leverage class text information to mitigate distribution drifts encountered by vision-language models (VLMs) during test-time inference.<n>We propose to generate pseudo-labels for the test-time samples by exploiting generic class text embeddings as fixed centroids of a label assignment problem.<n>Experiments on multiple popular test-time adaptation benchmarks presenting diverse complexity empirically show the superiority of CLIP-OT.
arXiv Detail & Related papers (2024-11-26T00:15:37Z) - Mixtures of In-Context Learners [18.920361190065556]
We propose a novel approach to treat subsets of demonstrations as experts and learn a weighting function to merge their output distributions.
In our experiments, we show performance improvements on 5 out of 7 classification datasets compared to a set of strong baselines.
MoICL is more robust to out-of-domain (up to +11%), imbalanced (up to +49%), or noisy demonstrations (up to +38%) or can filter these out from datasets.
arXiv Detail & Related papers (2024-11-05T06:02:41Z) - DemoShapley: Valuation of Demonstrations for In-Context Learning [20.26604061802236]
Large language models (LLMs) leveraging in-context learning (ICL) have set new benchmarks in few-shot learning across various tasks without needing task-specific fine-tuning.
We introduce DemoShapley which is inspired by the Data Shapley valuation theorem.
Our findings reveal that DemoShapley not only enhances model performance in terms of accuracy and fairness but also generalizes queries from domains distinct from those of the in-context demonstrations.
arXiv Detail & Related papers (2024-10-10T01:35:03Z) - ItTakesTwo: Leveraging Peer Representations for Semi-supervised LiDAR Semantic Segmentation [24.743048965822297]
This paper introduces a novel semi-supervised LiDAR semantic segmentation framework called ItTakesTwo (IT2)<n>IT2 is designed to ensure consistent predictions from peer LiDAR representations, thereby improving the perturbation effectiveness in consistency learning.<n>Results on public benchmarks show that our approach achieves remarkable improvements over the previous state-of-the-art (SOTA) methods in the field.
arXiv Detail & Related papers (2024-07-09T18:26:53Z) - C-ICL: Contrastive In-context Learning for Information Extraction [54.39470114243744]
c-ICL is a novel few-shot technique that leverages both correct and incorrect sample constructions to create in-context learning demonstrations.
Our experiments on various datasets indicate that c-ICL outperforms previous few-shot in-context learning methods.
arXiv Detail & Related papers (2024-02-17T11:28:08Z) - Revisiting Demonstration Selection Strategies in In-Context Learning [66.11652803887284]
Large language models (LLMs) have shown an impressive ability to perform a wide range of tasks using in-context learning (ICL)
In this work, we first revisit the factors contributing to this variance from both data and model aspects, and find that the choice of demonstration is both data- and model-dependent.
We propose a data- and model-dependent demonstration selection method, textbfTopK + ConE, based on the assumption that textitthe performance of a demonstration positively correlates with its contribution to the model's understanding of the test samples.
arXiv Detail & Related papers (2024-01-22T16:25:27Z) - Self-Improving for Zero-Shot Named Entity Recognition with Large Language Models [16.16724411695959]
This work pushes the performance boundary of zero-shot NER with powerful large language models (LLMs)
We propose a training-free self-improving framework, which utilizes an unlabeled corpus to stimulate the self-learning ability of LLMs.
Experiments on four benchmarks show substantial performance improvements achieved by our framework.
arXiv Detail & Related papers (2023-11-15T12:47:52Z) - Learning under Label Proportions for Text Classification [13.29710879730948]
We present one of the preliminary NLP works under the challenging setup of Learning from Proportions (LLP)
The data is provided in an aggregate form called bags and only the proportion of samples in each class as the ground truth.
arXiv Detail & Related papers (2023-10-18T04:39:25Z) - Dynamic Demonstrations Controller for In-Context Learning [48.455265597575675]
In-context learning (ICL) is a new paradigm for natural language processing (NLP)<n>It is commonly believed that the number of demonstrations is positively correlated with model performance.<n>We propose a Dynamic Demonstrations Controller (D$2$Controller) which can improve the ICL performance by adjusting the number of demonstrations.
arXiv Detail & Related papers (2023-09-30T14:04:22Z) - Self-ICL: Zero-Shot In-Context Learning with Self-Generated
Demonstrations [38.4166247280112]
Self-ICL is a framework which bootstraps LMs' intrinsic capabilities to perform zero-shot ICL.
Self-ICL outperforms zero-shot baselines on both average accuracy and head-to-head comparison.
arXiv Detail & Related papers (2023-05-24T11:22:34Z) - Coverage-based Example Selection for In-Context Learning [27.215972147196805]
We show that BERTScore-Recall (BSR) selects better examples that demonstrate more of the salient aspects of the test input.
On 15 datasets spanning 6 tasks and with 7 diverse LLMs, we show that (1) BSR is the superior metric for in-context example selection across the board, and (2) for compositional tasks, Set-BSR outperforms independent ranking by up to 17 points on average.
arXiv Detail & Related papers (2023-05-24T08:58:28Z) - Iterative Forward Tuning Boosts In-Context Learning in Language Models [88.25013390669845]
In this study, we introduce a novel two-stage framework to boost in-context learning in large language models (LLMs)
Specifically, our framework delineates the ICL process into two distinct stages: Deep-Thinking and test stages.
The Deep-Thinking stage incorporates a unique attention mechanism, i.e., iterative enhanced attention, which enables multiple rounds of information accumulation.
arXiv Detail & Related papers (2023-05-22T13:18:17Z) - Large Language Models Are Latent Variable Models: Explaining and Finding
Good Demonstrations for In-Context Learning [104.58874584354787]
In recent years, pre-trained large language models (LLMs) have demonstrated remarkable efficiency in achieving an inference-time few-shot learning capability known as in-context learning.
This study aims to examine the in-context learning phenomenon through a Bayesian lens, viewing real-world LLMs as latent variable models.
arXiv Detail & Related papers (2023-01-27T18:59:01Z) - Z-ICL: Zero-Shot In-Context Learning with Pseudo-Demonstrations [97.41375480696972]
We introduce Z-ICL, a new zero-shot method that closes the gap by constructing pseudo-demonstrations for a given test input.
evaluation on nine classification datasets shows that Z-ICL outperforms previous zero-shot methods by a significant margin.
arXiv Detail & Related papers (2022-12-19T21:34:26Z) - Learning New Tasks from a Few Examples with Soft-Label Prototypes [18.363177410917597]
We propose a novel few-shot learning approach based on soft-label prototypes (SLPs)
We focus on learning previously unseen NLP tasks from very few examples (4, 8, 16) per class.
We experimentally demonstrate that our approach achieves superior performance on the majority of tested tasks in this data-lean setting.
arXiv Detail & Related papers (2022-10-31T16:06:48Z) - Self-Generated In-Context Learning: Leveraging Auto-regressive Language
Models as a Demonstration Generator [22.532627423361177]
Self-generated in-context learning (SG-ICL) generates demonstrations for in-context learning from PLM itself.
We show SG-ICL significantly outperforms zero-shot learning and is generally worth approximately 0.6 gold training samples.
arXiv Detail & Related papers (2022-06-16T10:52:13Z) - UniT: Unified Knowledge Transfer for Any-shot Object Detection and
Segmentation [52.487469544343305]
Methods for object detection and segmentation rely on large scale instance-level annotations for training.
We propose an intuitive and unified semi-supervised model that is applicable to a range of supervision.
arXiv Detail & Related papers (2020-06-12T22:45:47Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.