Related papers: Exploring Demonstration Ensembling for In-context Learning

Exploring Demonstration Ensembling for In-context Learning

URL: http://arxiv.org/abs/2308.08780v2
Date: Mon, 21 Aug 2023 01:25:30 GMT
Title: Exploring Demonstration Ensembling for In-context Learning
Authors: Muhammad Khalifa, Lajanugen Logeswaran, Moontae Lee, Honglak Lee, Lu Wang
Abstract summary: In-context learning (ICL) operates by showing language models (LMs) examples of input-output pairs for a given task. The standard approach for ICL is to prompt the LMd demonstrations followed by the test input. In this work, we explore Demonstration Ensembling (DENSE) as an alternative to simple concatenation.
Score: 75.35436025709049
License: http://creativecommons.org/licenses/by/4.0/
Abstract: In-context learning (ICL) operates by showing language models (LMs) examples of input-output pairs for a given task, i.e., demonstrations. The standard approach for ICL is to prompt the LM with concatenated demonstrations followed by the test input. This approach suffers from some issues. First, concatenation offers almost no control over the contribution of each demo to the model prediction. This can be sub-optimal when some demonstrations are irrelevant to the test example. Second, due to the input length limit of some transformer models, it might be infeasible to fit many examples into the context, especially when dealing with long-input tasks. In this work, we explore Demonstration Ensembling (DENSE) as an alternative to simple concatenation. DENSE predicts outputs using subsets (i.e., buckets) of the demonstrations and then combines the output probabilities resulting from each subset to produce the final prediction. We study different ensembling methods using GPT-j and experiment on 12 language tasks. Our experiments show weighted max ensembling to outperform vanilla concatenation by as large as 2.4 average points. Code available at https://github.com/mukhal/icl-ensembling.

Related papers

PICLe: Pseudo-Annotations for In-Context Learning in Low-Resource Named Entity Detection [56.916656013563355]
In-context learning (ICL) enables Large Language Models to perform tasks using few demonstrations. We propose PICLe, a framework for in-context learning with noisy, pseudo-annotated demonstrations. We evaluate PICLe on five biomedical NED datasets and show that, with zero human annotation, PICLe outperforms ICL in low-resource settings.
arXiv Detail & Related papers (2024-12-16T16:09:35Z)
Aligning Language Models with Demonstrated Feedback [58.834937450242975]
Demonstration ITerated Task Optimization (DITTO) directly aligns language model outputs to a user's demonstrated behaviors. We evaluate DITTO's ability to learn fine-grained style and task alignment across domains such as news articles, emails, and blog posts.
arXiv Detail & Related papers (2024-06-02T23:13:56Z)
ParaICL: Towards Robust Parallel In-Context Learning [74.38022919598443]
Large language models (LLMs) have become the norm in natural language processing. Few-shot in-context learning (ICL) relies on the choice of few-shot demonstration examples. We propose a novel method named parallel in-context learning (ParaICL)
arXiv Detail & Related papers (2024-03-31T05:56:15Z)
MEND: Meta dEmonstratioN Distillation for Efficient and Effective In-Context Learning [9.271196993624944]
Large Language models (LLMs) make predictions for a given test input together with a few input-output pairs (demonstrations) Existing solutions attempt to distill lengthy demonstrations into compact vectors. We present Meta dEmonstratioN Distillation (MEND), where a language model learns to distill any lengthy demonstrations into vectors without retraining for a new downstream task.
arXiv Detail & Related papers (2024-03-11T17:03:04Z)
Not All Demonstration Examples are Equally Beneficial: Reweighting Demonstration Examples for In-Context Learning [32.29118942982609]
Large Language Models (LLMs) have recently gained the In-Context Learning (ICL) ability with the models scaling up. This paper investigates how to determine approximately optimal weights for demonstration examples and how to apply them during ICL. Experimental results on 8 text classification tasks show that our approach outperforms conventional ICL by a large margin.
arXiv Detail & Related papers (2023-10-12T13:15:11Z)
Dr.ICL: Demonstration-Retrieved In-context Learning [29.142262267850704]
In-context learning (ICL) teaching a large language model to perform a task with few-shot demonstrations has emerged as a strong paradigm for using LLMs. Recent research suggests that retrieving semantically similar demonstrations to the input from a pool of available demonstrations results in better performance. This work expands the applicability of retrieval-based ICL approaches by demonstrating that even simple word-overlap similarity measures such as BM25 outperform randomly selected demonstrations.
arXiv Detail & Related papers (2023-05-23T14:55:25Z)
Unified Demonstration Retriever for In-Context Learning [56.06473069923567]
Unified Demonstration Retriever (textbfUDR) is a single model to retrieve demonstrations for a wide range of tasks. We propose a multi-task list-wise ranking training framework, with an iterative mining strategy to find high-quality candidates. Experiments on 30+ tasks across 13 task families and multiple data domains show that UDR significantly outperforms baselines.
arXiv Detail & Related papers (2023-05-07T16:07:11Z)
ICL-D3IE: In-Context Learning with Diverse Demonstrations Updating for Document Information Extraction [56.790794611002106]
Large language models (LLMs) have demonstrated remarkable results in various natural language processing (NLP) tasks with in-context learning. We propose a simple but effective in-context learning framework called ICL-D3IE. Specifically, we extract the most difficult and distinct segments from hard training documents as hard demonstrations.
arXiv Detail & Related papers (2023-03-09T06:24:50Z)
Robustness of Demonstration-based Learning Under Limited Data Scenario [54.912936555876826]
Demonstration-based learning has shown great potential in stimulating pretrained language models' ability under limited data scenario. Why such demonstrations are beneficial for the learning process remains unclear since there is no explicit alignment between the demonstrations and the predictions. In this paper, we design pathological demonstrations by gradually removing intuitively useful information from the standard ones to take a deep dive of the robustness of demonstration-based sequence labeling.
arXiv Detail & Related papers (2022-10-19T16:15:04Z)
Self-Generated In-Context Learning: Leveraging Auto-regressive Language Models as a Demonstration Generator [22.532627423361177]
Self-generated in-context learning (SG-ICL) generates demonstrations for in-context learning from PLM itself. We show SG-ICL significantly outperforms zero-shot learning and is generally worth approximately 0.6 gold training samples.
arXiv Detail & Related papers (2022-06-16T10:52:13Z)
Contrastive Demonstration Tuning for Pre-trained Language Models [59.90340768724675]
Demonstration examples are crucial for an excellent final performance of prompt-tuning. The proposed approach can be: (i) Plugged into any previous prompt-tuning approaches; (ii) Extended to widespread classification tasks with a large number of categories. Experimental results on 16 datasets illustrate that our method integrated with previous approaches LM-BFF and P-tuning can yield better performance.
arXiv Detail & Related papers (2022-04-09T05:30:48Z)
Robust Maximum Entropy Behavior Cloning [15.713997170792842]
Imitation learning (IL) algorithms use expert demonstrations to learn a specific task. Most of the existing approaches assume that all expert demonstrations are reliable and trustworthy, but what if there exist some adversarial demonstrations among the given data-set? We propose a novel general frame-work to directly generate a policy from demonstrations that autonomously detect the adversarial demonstrations and exclude them from the data set.
arXiv Detail & Related papers (2021-01-04T22:08:46Z)

This list is automatically generated from the titles and abstracts of the papers in this site.