More is not always better? Enhancing Many-Shot In-Context Learning with Differentiated and Reweighting Objectives
- URL: http://arxiv.org/abs/2501.04070v2
- Date: Thu, 09 Jan 2025 02:20:13 GMT
- Title: More is not always better? Enhancing Many-Shot In-Context Learning with Differentiated and Reweighting Objectives
- Authors: Xiaoqing Zhang, Ang Lv, Yuhan Liu, Flood Sung, Wei Liu, Shuo Shang, Xiuying Chen, Rui Yan,
- Abstract summary: We introduce DrICL, a novel optimization method that enhances model performance through Differentiated Learning and advantage-based Reweighting objectives.
Globally, DrICL utilizes differentiated learning to optimize the NLL objective, ensuring that many-shot performance surpasses zero-shot levels.
We develop the Many-Shot ICL Benchmark (ICL-50)-a large-scale benchmark of 50 tasks that cover shot numbers from 1 to 350 within sequences of up to 8,000 tokens-for fine-tuning purposes.
- Score: 50.772462704559345
- License:
- Abstract: Large language models (LLMs) excel at few-shot in-context learning (ICL) without requiring parameter updates. However, as the number of ICL demonstrations increases from a few to many, performance tends to plateau and eventually decline. We identify two primary causes for this trend: the suboptimal negative log-likelihood (NLL) optimization objective and the incremental data noise. To address these issues, we introduce DrICL, a novel optimization method that enhances model performance through Differentiated Learning and advantage-based Reweighting objectives. Globally, DrICL utilizes differentiated learning to optimize the NLL objective, ensuring that many-shot performance surpasses zero-shot levels. Locally, it dynamically adjusts the weighting of many-shot demonstrations by leveraging cumulative advantages inspired by reinforcement learning, thereby improving generalization. This approach allows the model to handle varying numbers of shots effectively, mitigating the impact of noisy data. Recognizing the lack of multi-task datasets with diverse many-shot distributions, we develop the Many-Shot ICL Benchmark (ICL-50)-a large-scale benchmark of 50 tasks that cover shot numbers from 1 to 350 within sequences of up to 8,000 tokens-for fine-tuning purposes. ICL-50 facilitates the evaluation of many-shot ICL strategies across seven prominent NLP tasks and 50 distinct datasets. Experimental results demonstrate that LLMs enhanced with DrICL achieve significant improvements in many-shot setups across various tasks, including both in-domain and out-of-domain scenarios. We release the code and benchmark dataset hoping to facilitate further research in many-shot ICL.
Related papers
- Large Language Models are Few-shot Multivariate Time Series Classifiers [23.045734479292356]
Large Language Models (LLMs) have been extensively applied in time series analysis.
Yet, their utility in the few-shot classification (i.e., a crucial training scenario) is underexplored.
We aim to leverage the extensive pre-trained knowledge in LLMs to overcome the data scarcity problem.
arXiv Detail & Related papers (2025-01-30T03:59:59Z) - MAmmoTH-VL: Eliciting Multimodal Reasoning with Instruction Tuning at Scale [66.73529246309033]
multimodal large language models (MLLMs) have shown significant potential in a broad range of multimodal tasks.
Existing instruction-tuning datasets only provide phrase-level answers without any intermediate rationales.
We introduce a scalable and cost-effective method to construct a large-scale multimodal instruction-tuning dataset with rich intermediate rationales.
arXiv Detail & Related papers (2024-12-06T18:14:24Z) - Divide, Reweight, and Conquer: A Logit Arithmetic Approach for In-Context Learning [19.16587730306472]
In-Context Learning (ICL) emerges as a key feature for Large Language Models (LLMs)
We propose Logit Arithmetic Reweighting Approach (LARA), a novel framework that enhances ICL by using logit-based ensembling of multiple demonstrations.
arXiv Detail & Related papers (2024-10-14T01:34:16Z) - Instruction Tuning Vs. In-Context Learning: Revisiting Large Language Models in Few-Shot Computational Social Science [0.1499944454332829]
We evaluate the classification performance of large language models (LLMs) using in-context learning (ICL) and instruction tuning (IT)
ICL offers a rapid alternative for task adaptation by learning from examples without explicit gradient updates.
Our research highlights the significant advantages of ICL in handling CSS tasks in few-shot settings.
arXiv Detail & Related papers (2024-09-23T02:43:08Z) - SELF-GUIDE: Better Task-Specific Instruction Following via Self-Synthetic Finetuning [70.21358720599821]
Large language models (LLMs) hold the promise of solving diverse tasks when provided with appropriate natural language prompts.
We propose SELF-GUIDE, a multi-stage mechanism in which we synthesize task-specific input-output pairs from the student LLM.
We report an absolute improvement of approximately 15% for classification tasks and 18% for generation tasks in the benchmark's metrics.
arXiv Detail & Related papers (2024-07-16T04:41:58Z) - Many-Shot In-Context Learning in Multimodal Foundation Models [4.772535803521769]
Large language models are effective at few-shot in-context learning (ICL)
Recent advancements in multimodal foundation models have enabled unprecedentedly long context windows.
We benchmark GPT-4o and Gemini 1.5 Pro across 14 datasets spanning multiple domains.
arXiv Detail & Related papers (2024-05-16T04:02:43Z) - Many-Shot In-Context Learning [58.395589302800566]
Large language models (LLMs) excel at few-shot in-context learning (ICL)
We observe significant performance gains across a wide variety of generative and discriminative tasks.
Unlike few-shot learning, many-shot learning is effective at overriding pretraining biases.
arXiv Detail & Related papers (2024-04-17T02:49:26Z) - Benchmark Self-Evolving: A Multi-Agent Framework for Dynamic LLM
Evaluation [51.99752147380505]
This paper presents a benchmark self-evolving framework to dynamically evaluate Large Language Models (LLMs)
We utilize a multi-agent system to manipulate the context or question of original instances, reframing new evolving instances with high confidence.
Our framework widens performance discrepancies both between different models and within the same model across various tasks.
arXiv Detail & Related papers (2024-02-18T03:40:06Z) - Which Examples to Annotate for In-Context Learning? Towards Effective
and Efficient Selection [35.924633625147365]
Large Language Models (LLMs) can adapt to new tasks via in-context learning (ICL)
In this work, we investigate an active learning approach for ICL, where there is a limited budget for annotating examples.
We propose a model-adaptive optimization-free algorithm, termed AdaICL, which identifies examples that the model is uncertain about.
arXiv Detail & Related papers (2023-10-30T22:03:55Z) - RA-DIT: Retrieval-Augmented Dual Instruction Tuning [90.98423540361946]
Retrieval-augmented language models (RALMs) improve performance by accessing long-tail and up-to-date knowledge from external data stores.
Existing approaches require either expensive retrieval-specific modifications to LM pre-training or use post-hoc integration of the data store that leads to suboptimal performance.
We introduce Retrieval-Augmented Dual Instruction Tuning (RA-DIT), a lightweight fine-tuning methodology that provides a third option.
arXiv Detail & Related papers (2023-10-02T17:16:26Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.