Dynamic Demonstrations Controller for In-Context Learning
- URL: http://arxiv.org/abs/2310.00385v1
- Date: Sat, 30 Sep 2023 14:04:22 GMT
- Title: Dynamic Demonstrations Controller for In-Context Learning
- Authors: Fei Zhao, Taotian Pang, Zhen Wu, Zheng Ma, Shujian Huang, Xinyu Dai
- Abstract summary: In-Context Learning (ICL) is a new paradigm for natural language processing (NLP), where a large language model observes a small number of demonstrations and a test instance as its input.
Previous studies have revealed that ICL is sensitive to the selection and the ordering of demonstrations.
We propose a Dynamic Demonstrations Controller (D$2$Controller), which can improve the ICL performance by adjusting the number of demonstrations.
- Score: 51.3439660534631
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: In-Context Learning (ICL) is a new paradigm for natural language processing
(NLP), where a large language model (LLM) observes a small number of
demonstrations and a test instance as its input, and directly makes predictions
without updating model parameters. Previous studies have revealed that ICL is
sensitive to the selection and the ordering of demonstrations. However, there
are few studies regarding the impact of the demonstration number on the ICL
performance within a limited input length of LLM, because it is commonly
believed that the number of demonstrations is positively correlated with model
performance. In this paper, we found this conclusion does not always hold true.
Through pilot experiments, we discover that increasing the number of
demonstrations does not necessarily lead to improved performance. Building upon
this insight, we propose a Dynamic Demonstrations Controller (D$^2$Controller),
which can improve the ICL performance by adjusting the number of demonstrations
dynamically. The experimental results show that D$^2$Controller yields a 5.4%
relative improvement on eight different sizes of LLMs across ten datasets.
Moreover, we also extend our method to previous ICL models and achieve
competitive results.
Related papers
- Mixtures of In-Context Learners [18.920361190065556]
We propose a novel approach to treat subsets of demonstrations as experts and learn a weighting function to merge their output distributions.
In our experiments, we show performance improvements on 5 out of 7 classification datasets compared to a set of strong baselines.
MoICL is more robust to out-of-domain (up to +11%), imbalanced (up to +49%), or noisy demonstrations (up to +38%) or can filter these out from datasets.
arXiv Detail & Related papers (2024-11-05T06:02:41Z) - DemoShapley: Valuation of Demonstrations for In-Context Learning [20.26604061802236]
Large language models (LLMs) leveraging in-context learning (ICL) have set new benchmarks in few-shot learning across various tasks without needing task-specific fine-tuning.
We introduce DemoShapley which is inspired by the Data Shapley valuation theorem.
Our findings reveal that DemoShapley not only enhances model performance in terms of accuracy and fairness but also generalizes queries from domains distinct from those of the in-context demonstrations.
arXiv Detail & Related papers (2024-10-10T01:35:03Z) - DETAIL: Task DEmonsTration Attribution for Interpretable In-context Learning [75.68193159293425]
In-context learning (ICL) allows transformer-based language models to learn a specific task with a few "task demonstrations" without updating their parameters.
We propose an influence function-based attribution technique, DETAIL, that addresses the specific characteristics of ICL.
We experimentally prove the wide applicability of DETAIL by showing our attribution scores obtained on white-box models are transferable to black-box models in improving model performance.
arXiv Detail & Related papers (2024-05-22T15:52:52Z) - Revisiting Demonstration Selection Strategies in In-Context Learning [66.11652803887284]
Large language models (LLMs) have shown an impressive ability to perform a wide range of tasks using in-context learning (ICL)
In this work, we first revisit the factors contributing to this variance from both data and model aspects, and find that the choice of demonstration is both data- and model-dependent.
We propose a data- and model-dependent demonstration selection method, textbfTopK + ConE, based on the assumption that textitthe performance of a demonstration positively correlates with its contribution to the model's understanding of the test samples.
arXiv Detail & Related papers (2024-01-22T16:25:27Z) - In-context Learning with Retrieved Demonstrations for Language Models: A Survey [23.24271704145876]
Few-shot in-context learners (ICL) are adept at adapting to new tasks with just a few demonstrations in the input context.
Instead of using a fixed set of demonstrations, one recent development is to retrieve demonstrations tailored to each input query.
We discuss and compare different design choices for retrieval models, retrieval training procedures, and inference algorithms.
arXiv Detail & Related papers (2024-01-21T23:34:42Z) - Improving Input-label Mapping with Demonstration Replay for In-context
Learning [67.57288926736923]
In-context learning (ICL) is an emerging capability of large autoregressive language models.
We propose a novel ICL method called Sliding Causal Attention (RdSca)
We show that our method significantly improves the input-label mapping in ICL demonstrations.
arXiv Detail & Related papers (2023-10-30T14:29:41Z) - Dr.ICL: Demonstration-Retrieved In-context Learning [29.142262267850704]
In-context learning (ICL) teaching a large language model to perform a task with few-shot demonstrations has emerged as a strong paradigm for using LLMs.
Recent research suggests that retrieving semantically similar demonstrations to the input from a pool of available demonstrations results in better performance.
This work expands the applicability of retrieval-based ICL approaches by demonstrating that even simple word-overlap similarity measures such as BM25 outperform randomly selected demonstrations.
arXiv Detail & Related papers (2023-05-23T14:55:25Z) - Iterative Forward Tuning Boosts In-Context Learning in Language Models [88.25013390669845]
In this study, we introduce a novel two-stage framework to boost in-context learning in large language models (LLMs)
Specifically, our framework delineates the ICL process into two distinct stages: Deep-Thinking and test stages.
The Deep-Thinking stage incorporates a unique attention mechanism, i.e., iterative enhanced attention, which enables multiple rounds of information accumulation.
arXiv Detail & Related papers (2023-05-22T13:18:17Z) - Large Language Models Are Latent Variable Models: Explaining and Finding
Good Demonstrations for In-Context Learning [104.58874584354787]
In recent years, pre-trained large language models (LLMs) have demonstrated remarkable efficiency in achieving an inference-time few-shot learning capability known as in-context learning.
This study aims to examine the in-context learning phenomenon through a Bayesian lens, viewing real-world LLMs as latent variable models.
arXiv Detail & Related papers (2023-01-27T18:59:01Z) - Self-Generated In-Context Learning: Leveraging Auto-regressive Language
Models as a Demonstration Generator [22.532627423361177]
Self-generated in-context learning (SG-ICL) generates demonstrations for in-context learning from PLM itself.
We show SG-ICL significantly outperforms zero-shot learning and is generally worth approximately 0.6 gold training samples.
arXiv Detail & Related papers (2022-06-16T10:52:13Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.