Unraveling Indirect In-Context Learning Using Influence Functions
- URL: http://arxiv.org/abs/2501.01473v1
- Date: Wed, 01 Jan 2025 12:37:12 GMT
- Title: Unraveling Indirect In-Context Learning Using Influence Functions
- Authors: Hadi Askari, Shivanshu Gupta, Terry Tong, Fei Wang, Anshuman Chhabra, Muhao Chen,
- Abstract summary: We explore demonstration selection strategies tailored for two distinct real-world scenarios: Mixture of Tasks and Noisy Demonstrations.
For the Mixture of Tasks setting, demonstrations are drawn from 28 diverse tasks, including MMLU, BigBench, StrategyQA, and CommonsenseQA.
In the Noisy Demonstrations setting, we examine scenarios where demonstrations might be mislabeled.
- Score: 31.36463513108773
- License:
- Abstract: This work introduces a novel paradigm for generalized In-Context Learning (ICL), termed Indirect In-Context Learning. In Indirect ICL, we explore demonstration selection strategies tailored for two distinct real-world scenarios: Mixture of Tasks and Noisy Demonstrations. We systematically evaluate the effectiveness of Influence Functions (IFs) as a selection tool for these settings, highlighting the potential for IFs to better capture the informativeness of examples within the demonstration pool. For the Mixture of Tasks setting, demonstrations are drawn from 28 diverse tasks, including MMLU, BigBench, StrategyQA, and CommonsenseQA. We demonstrate that combining BertScore-Recall (BSR) with an IF surrogate model can significantly improve performance, leading to average absolute accuracy gains of 0.37\% and 1.45\% for 3-shot and 5-shot setups when compared to traditional ICL metrics. In the Noisy Demonstrations setting, we examine scenarios where demonstrations might be mislabeled. Our experiments show that reweighting traditional ICL selectors (BSR and Cosine Similarity) with IF-based selectors boosts accuracy by an average of 2.90\% for Cosine Similarity and 2.94\% for BSR on noisy GLUE benchmarks. In sum, we propose a robust framework for demonstration selection that generalizes beyond traditional ICL, offering valuable insights into the role of IFs for Indirect ICL.
Related papers
- What Makes In-context Learning Effective for Mathematical Reasoning: A Theoretical Analysis [81.15503859645149]
In this paper, we aim to theoretically analyze the impact of in-context demonstrations on large language models' reasoning performance.
We propose a straightforward, generalizable, and low-complexity demonstration selection method named LMS3.
arXiv Detail & Related papers (2024-12-11T11:38:11Z) - Mixtures of In-Context Learners [18.920361190065556]
We propose a novel approach to treat subsets of demonstrations as experts and learn a weighting function to merge their output distributions.
In our experiments, we show performance improvements on 5 out of 7 classification datasets compared to a set of strong baselines.
MoICL is more robust to out-of-domain (up to +11%), imbalanced (up to +49%), or noisy demonstrations (up to +38%) or can filter these out from datasets.
arXiv Detail & Related papers (2024-11-05T06:02:41Z) - DemoShapley: Valuation of Demonstrations for In-Context Learning [20.26604061802236]
Large language models (LLMs) leveraging in-context learning (ICL) have set new benchmarks in few-shot learning across various tasks without needing task-specific fine-tuning.
We introduce DemoShapley which is inspired by the Data Shapley valuation theorem.
Our findings reveal that DemoShapley not only enhances model performance in terms of accuracy and fairness but also generalizes queries from domains distinct from those of the in-context demonstrations.
arXiv Detail & Related papers (2024-10-10T01:35:03Z) - ParaICL: Towards Robust Parallel In-Context Learning [74.38022919598443]
Large language models (LLMs) have become the norm in natural language processing.
Few-shot in-context learning (ICL) relies on the choice of few-shot demonstration examples.
We propose a novel method named parallel in-context learning (ParaICL)
arXiv Detail & Related papers (2024-03-31T05:56:15Z) - In-Context Learning Demonstration Selection via Influence Analysis [11.504012974208466]
Large Language Models (LLMs) have showcased their In-Context Learning (ICL) capabilities.
Despite its advantages, the effectiveness of ICL heavily depends on the choice of demonstrations.
We propose a demonstration selection method named InfICL, which utilizes influence functions to analyze impacts of training samples.
arXiv Detail & Related papers (2024-02-19T00:39:31Z) - Revisiting Demonstration Selection Strategies in In-Context Learning [66.11652803887284]
Large language models (LLMs) have shown an impressive ability to perform a wide range of tasks using in-context learning (ICL)
In this work, we first revisit the factors contributing to this variance from both data and model aspects, and find that the choice of demonstration is both data- and model-dependent.
We propose a data- and model-dependent demonstration selection method, textbfTopK + ConE, based on the assumption that textitthe performance of a demonstration positively correlates with its contribution to the model's understanding of the test samples.
arXiv Detail & Related papers (2024-01-22T16:25:27Z) - Hint-enhanced In-Context Learning wakes Large Language Models up for knowledge-intensive tasks [54.153914606302486]
In-context learning (ICL) ability has emerged with the increasing scale of large language models (LLMs)
We propose a new paradigm called Hint-enhanced In-Context Learning (HICL) to explore the power of ICL in open-domain question answering.
arXiv Detail & Related papers (2023-11-03T14:39:20Z) - In-Context Demonstration Selection with Cross Entropy Difference [95.21947716378641]
Large language models (LLMs) can use in-context demonstrations to improve performance on zero-shot tasks.
We present a cross-entropy difference (CED) method for selecting in-context demonstrations.
arXiv Detail & Related papers (2023-05-24T05:04:00Z) - Iterative Forward Tuning Boosts In-Context Learning in Language Models [88.25013390669845]
In this study, we introduce a novel two-stage framework to boost in-context learning in large language models (LLMs)
Specifically, our framework delineates the ICL process into two distinct stages: Deep-Thinking and test stages.
The Deep-Thinking stage incorporates a unique attention mechanism, i.e., iterative enhanced attention, which enables multiple rounds of information accumulation.
arXiv Detail & Related papers (2023-05-22T13:18:17Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.