Exploring Demonstration Ensembling for In-context Learning
- URL: http://arxiv.org/abs/2308.08780v2
- Date: Mon, 21 Aug 2023 01:25:30 GMT
- Title: Exploring Demonstration Ensembling for In-context Learning
- Authors: Muhammad Khalifa, Lajanugen Logeswaran, Moontae Lee, Honglak Lee, Lu
Wang
- Abstract summary: In-context learning (ICL) operates by showing language models (LMs) examples of input-output pairs for a given task.
The standard approach for ICL is to prompt the LMd demonstrations followed by the test input.
In this work, we explore Demonstration Ensembling (DENSE) as an alternative to simple concatenation.
- Score: 75.35436025709049
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: In-context learning (ICL) operates by showing language models (LMs) examples
of input-output pairs for a given task, i.e., demonstrations. The standard
approach for ICL is to prompt the LM with concatenated demonstrations followed
by the test input. This approach suffers from some issues. First, concatenation
offers almost no control over the contribution of each demo to the model
prediction. This can be sub-optimal when some demonstrations are irrelevant to
the test example. Second, due to the input length limit of some transformer
models, it might be infeasible to fit many examples into the context,
especially when dealing with long-input tasks. In this work, we explore
Demonstration Ensembling (DENSE) as an alternative to simple concatenation.
DENSE predicts outputs using subsets (i.e., buckets) of the demonstrations and
then combines the output probabilities resulting from each subset to produce
the final prediction. We study different ensembling methods using GPT-j and
experiment on 12 language tasks. Our experiments show weighted max ensembling
to outperform vanilla concatenation by as large as 2.4 average points. Code
available at https://github.com/mukhal/icl-ensembling.
Related papers
- ParaICL: Towards Robust Parallel In-Context Learning [74.38022919598443]
Large language models (LLMs) have become the norm in natural language processing.
Few-shot in-context learning (ICL) relies on the choice of few-shot demonstration examples.
We propose a novel method named parallel in-context learning (ParaICL)
arXiv Detail & Related papers (2024-03-31T05:56:15Z) - MEND: Meta dEmonstratioN Distillation for Efficient and Effective
In-Context Learning [9.271196993624944]
Large Language models (LLMs) make predictions for a given test input together with a few input-output pairs (demonstrations)
Existing solutions attempt to distill lengthy demonstrations into compact vectors.
We present Meta dEmonstratioN Distillation (MEND), where a language model learns to distill any lengthy demonstrations into vectors without retraining for a new downstream task.
arXiv Detail & Related papers (2024-03-11T17:03:04Z) - Not All Demonstration Examples are Equally Beneficial: Reweighting
Demonstration Examples for In-Context Learning [32.29118942982609]
Large Language Models (LLMs) have recently gained the In-Context Learning (ICL) ability with the models scaling up.
This paper investigates how to determine approximately optimal weights for demonstration examples and how to apply them during ICL.
Experimental results on 8 text classification tasks show that our approach outperforms conventional ICL by a large margin.
arXiv Detail & Related papers (2023-10-12T13:15:11Z) - Dr.ICL: Demonstration-Retrieved In-context Learning [29.142262267850704]
In-context learning (ICL) teaching a large language model to perform a task with few-shot demonstrations has emerged as a strong paradigm for using LLMs.
Recent research suggests that retrieving semantically similar demonstrations to the input from a pool of available demonstrations results in better performance.
This work expands the applicability of retrieval-based ICL approaches by demonstrating that even simple word-overlap similarity measures such as BM25 outperform randomly selected demonstrations.
arXiv Detail & Related papers (2023-05-23T14:55:25Z) - Unified Demonstration Retriever for In-Context Learning [56.06473069923567]
Unified Demonstration Retriever (textbfUDR) is a single model to retrieve demonstrations for a wide range of tasks.
We propose a multi-task list-wise ranking training framework, with an iterative mining strategy to find high-quality candidates.
Experiments on 30+ tasks across 13 task families and multiple data domains show that UDR significantly outperforms baselines.
arXiv Detail & Related papers (2023-05-07T16:07:11Z) - ICL-D3IE: In-Context Learning with Diverse Demonstrations Updating for
Document Information Extraction [56.790794611002106]
Large language models (LLMs) have demonstrated remarkable results in various natural language processing (NLP) tasks with in-context learning.
We propose a simple but effective in-context learning framework called ICL-D3IE.
Specifically, we extract the most difficult and distinct segments from hard training documents as hard demonstrations.
arXiv Detail & Related papers (2023-03-09T06:24:50Z) - Robustness of Demonstration-based Learning Under Limited Data Scenario [54.912936555876826]
Demonstration-based learning has shown great potential in stimulating pretrained language models' ability under limited data scenario.
Why such demonstrations are beneficial for the learning process remains unclear since there is no explicit alignment between the demonstrations and the predictions.
In this paper, we design pathological demonstrations by gradually removing intuitively useful information from the standard ones to take a deep dive of the robustness of demonstration-based sequence labeling.
arXiv Detail & Related papers (2022-10-19T16:15:04Z) - Self-Generated In-Context Learning: Leveraging Auto-regressive Language
Models as a Demonstration Generator [22.532627423361177]
Self-generated in-context learning (SG-ICL) generates demonstrations for in-context learning from PLM itself.
We show SG-ICL significantly outperforms zero-shot learning and is generally worth approximately 0.6 gold training samples.
arXiv Detail & Related papers (2022-06-16T10:52:13Z) - Contrastive Demonstration Tuning for Pre-trained Language Models [59.90340768724675]
Demonstration examples are crucial for an excellent final performance of prompt-tuning.
The proposed approach can be: (i) Plugged into any previous prompt-tuning approaches; (ii) Extended to widespread classification tasks with a large number of categories.
Experimental results on 16 datasets illustrate that our method integrated with previous approaches LM-BFF and P-tuning can yield better performance.
arXiv Detail & Related papers (2022-04-09T05:30:48Z) - Robust Maximum Entropy Behavior Cloning [15.713997170792842]
Imitation learning (IL) algorithms use expert demonstrations to learn a specific task.
Most of the existing approaches assume that all expert demonstrations are reliable and trustworthy, but what if there exist some adversarial demonstrations among the given data-set?
We propose a novel general frame-work to directly generate a policy from demonstrations that autonomously detect the adversarial demonstrations and exclude them from the data set.
arXiv Detail & Related papers (2021-01-04T22:08:46Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.