Active Learning Principles for In-Context Learning with Large Language
Models
- URL: http://arxiv.org/abs/2305.14264v2
- Date: Wed, 22 Nov 2023 10:22:18 GMT
- Title: Active Learning Principles for In-Context Learning with Large Language
Models
- Authors: Katerina Margatina and Timo Schick and Nikolaos Aletras and Jane
Dwivedi-Yu
- Abstract summary: This paper investigates how Active Learning algorithms can serve as effective demonstration selection methods for in-context learning.
We show that in-context example selection through AL prioritizes high-quality examples that exhibit low uncertainty and bear similarity to the test examples.
- Score: 65.09970281795769
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: The remarkable advancements in large language models (LLMs) have
significantly enhanced the performance in few-shot learning settings. By using
only a small number of labeled examples, referred to as demonstrations, LLMs
can effectively grasp the task at hand through in-context learning. However,
the process of selecting appropriate demonstrations has received limited
attention in prior work. This paper addresses the issue of identifying the most
informative demonstrations for few-shot learning by approaching it as a
pool-based Active Learning (AL) problem over a single iteration. Our objective
is to investigate how AL algorithms can serve as effective demonstration
selection methods for in-context learning. We compare various standard AL
algorithms based on uncertainty, diversity, and similarity, and consistently
observe that the latter outperforms all other methods, including random
sampling. Notably, uncertainty sampling, despite its success in conventional
supervised learning scenarios, performs poorly in this context. Our extensive
experimentation involving a diverse range of GPT and OPT models across $24$
classification and multi-choice tasks, coupled with thorough analysis,
unambiguously demonstrates that in-context example selection through AL
prioritizes high-quality examples that exhibit low uncertainty and bear
similarity to the test examples.
Related papers
- Comparative Analysis of Demonstration Selection Algorithms for LLM In-Context Learning [18.58278188791548]
In-context learning can help Large Language Models (LLMs) to adapt new tasks without additional training.
Despite all the proposed demonstration selection algorithms, efficiency and effectiveness remain unclear.
This lack of clarity makes it difficult to apply these algorithms in real-world scenarios.
arXiv Detail & Related papers (2024-10-30T15:11:58Z) - Experimental Design for Active Transductive Inference in Large Language Models [18.2671641610825]
We use active learning for adaptive prompt design and call it Active In-context Prompt Design (AIPD)
We design the LLM prompt by adaptively choosing few-shot examples from a training set to optimize performance on a test set.
We propose two algorithms, GO and SAL, which differ in how the few-shot examples are chosen.
arXiv Detail & Related papers (2024-04-12T23:27:46Z) - ParaICL: Towards Robust Parallel In-Context Learning [74.38022919598443]
Large language models (LLMs) have become the norm in natural language processing.
Few-shot in-context learning (ICL) relies on the choice of few-shot demonstration examples.
We propose a novel method named parallel in-context learning (ParaICL)
arXiv Detail & Related papers (2024-03-31T05:56:15Z) - C-ICL: Contrastive In-context Learning for Information Extraction [54.39470114243744]
c-ICL is a novel few-shot technique that leverages both correct and incorrect sample constructions to create in-context learning demonstrations.
Our experiments on various datasets indicate that c-ICL outperforms previous few-shot in-context learning methods.
arXiv Detail & Related papers (2024-02-17T11:28:08Z) - Revisiting Demonstration Selection Strategies in In-Context Learning [66.11652803887284]
Large language models (LLMs) have shown an impressive ability to perform a wide range of tasks using in-context learning (ICL)
In this work, we first revisit the factors contributing to this variance from both data and model aspects, and find that the choice of demonstration is both data- and model-dependent.
We propose a data- and model-dependent demonstration selection method, textbfTopK + ConE, based on the assumption that textitthe performance of a demonstration positively correlates with its contribution to the model's understanding of the test samples.
arXiv Detail & Related papers (2024-01-22T16:25:27Z) - RetICL: Sequential Retrieval of In-Context Examples with Reinforcement Learning [53.52699766206808]
We propose Retrieval for In-Context Learning (RetICL), a learnable method for modeling and optimally selecting examples sequentially for in-context learning.
We evaluate RetICL on math word problem solving and scientific question answering tasks and show that it consistently outperforms or matches and learnable baselines.
arXiv Detail & Related papers (2023-05-23T20:15:56Z) - Compositional Exemplars for In-context Learning [21.961094715261133]
Large pretrained language models (LMs) have shown impressive In-Context Learning (ICL) ability.
We propose CEIL (Compositional Exemplars for In-context Learning) to model the interaction between the given input and in-context examples.
We validate CEIL on 12 classification and generation datasets from 7 distinct NLP tasks, including sentiment analysis, paraphrase detection, natural language inference, commonsense reasoning, open-domain question answering, code generation, and semantic parsing.
arXiv Detail & Related papers (2023-02-11T14:02:08Z) - Learning New Tasks from a Few Examples with Soft-Label Prototypes [18.363177410917597]
We propose a novel few-shot learning approach based on soft-label prototypes (SLPs)
We focus on learning previously unseen NLP tasks from very few examples (4, 8, 16) per class.
We experimentally demonstrate that our approach achieves superior performance on the majority of tested tasks in this data-lean setting.
arXiv Detail & Related papers (2022-10-31T16:06:48Z) - True Few-Shot Learning with Language Models [78.42578316883271]
We evaluate the few-shot ability of LMs when held-out examples are unavailable.
Our findings suggest that prior work significantly overestimated the true few-shot ability of LMs.
arXiv Detail & Related papers (2021-05-24T17:55:51Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.