Skill-Based Few-Shot Selection for In-Context Learning
- URL: http://arxiv.org/abs/2305.14210v2
- Date: Tue, 10 Oct 2023 16:23:33 GMT
- Title: Skill-Based Few-Shot Selection for In-Context Learning
- Authors: Shengnan An, Bo Zhou, Zeqi Lin, Qiang Fu, Bei Chen, Nanning Zheng,
Weizhu Chen and Jian-Guang Lou
- Abstract summary: Skill-KNN is a skill-based few-shot selection method for in-context learning.
It does not require training or fine-tuning of any models, making it suitable for frequently expanding or changing example banks.
Experimental results across five cross-domain semantic parsing datasets and six backbone models show that Skill-KNN significantly outperforms existing methods.
- Score: 123.26522773708683
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: In-context learning is the paradigm that adapts large language models to
downstream tasks by providing a few examples. Few-shot selection -- selecting
appropriate examples for each test instance separately -- is important for
in-context learning. In this paper, we propose Skill-KNN, a skill-based
few-shot selection method for in-context learning. The key advantages of
Skill-KNN include: (1) it addresses the problem that existing methods based on
pre-trained embeddings can be easily biased by surface natural language
features that are not important for the target task; (2) it does not require
training or fine-tuning of any models, making it suitable for frequently
expanding or changing example banks. The key insight is to optimize the inputs
fed into the embedding model, rather than tuning the model itself. Technically,
Skill-KNN generates the skill-based descriptions for each test case and
candidate example by utilizing a pre-processing few-shot prompting, thus
eliminating unimportant surface features. Experimental results across five
cross-domain semantic parsing datasets and six backbone models show that
Skill-KNN significantly outperforms existing methods.
Related papers
- Irreducible Curriculum for Language Model Pretraining [46.895234111411426]
We propose irreducible curriculum as a curriculum learning algorithm for language model pretraining.
Our experiments on the RedPajama-1B dataset demonstrate a consistent improvement on validation perplexity across all 7 domains.
arXiv Detail & Related papers (2023-10-23T22:41:33Z) - RetICL: Sequential Retrieval of In-Context Examples with Reinforcement Learning [53.52699766206808]
We propose Retrieval for In-Context Learning (RetICL), a learnable method for modeling and optimally selecting examples sequentially for in-context learning.
We evaluate RetICL on math word problem solving and scientific question answering tasks and show that it consistently outperforms or matches and learnable baselines.
arXiv Detail & Related papers (2023-05-23T20:15:56Z) - Active Learning Principles for In-Context Learning with Large Language
Models [65.09970281795769]
This paper investigates how Active Learning algorithms can serve as effective demonstration selection methods for in-context learning.
We show that in-context example selection through AL prioritizes high-quality examples that exhibit low uncertainty and bear similarity to the test examples.
arXiv Detail & Related papers (2023-05-23T17:16:04Z) - Pre-Training to Learn in Context [138.0745138788142]
The ability of in-context learning is not fully exploited because language models are not explicitly trained to learn in context.
We propose PICL (Pre-training for In-Context Learning), a framework to enhance the language models' in-context learning ability.
Our experiments show that PICL is more effective and task-generalizable than a range of baselines, outperforming larger language models with nearly 4x parameters.
arXiv Detail & Related papers (2023-05-16T03:38:06Z) - Compositional Exemplars for In-context Learning [21.961094715261133]
Large pretrained language models (LMs) have shown impressive In-Context Learning (ICL) ability.
We propose CEIL (Compositional Exemplars for In-context Learning) to model the interaction between the given input and in-context examples.
We validate CEIL on 12 classification and generation datasets from 7 distinct NLP tasks, including sentiment analysis, paraphrase detection, natural language inference, commonsense reasoning, open-domain question answering, code generation, and semantic parsing.
arXiv Detail & Related papers (2023-02-11T14:02:08Z) - Frugal Reinforcement-based Active Learning [12.18340575383456]
We propose a novel active learning approach for label-efficient training.
The proposed method is iterative and aims at minimizing a constrained objective function that mixes diversity, representativity and uncertainty criteria.
We also introduce a novel weighting mechanism based on reinforcement learning, which adaptively balances these criteria at each training iteration.
arXiv Detail & Related papers (2022-12-09T14:17:45Z) - Plex: Towards Reliability using Pretrained Large Model Extensions [69.13326436826227]
We develop ViT-Plex and T5-Plex, pretrained large model extensions for vision and language modalities, respectively.
Plex greatly improves the state-of-the-art across reliability tasks, and simplifies the traditional protocol.
We demonstrate scaling effects over model sizes up to 1B parameters and pretraining dataset sizes up to 4B examples.
arXiv Detail & Related papers (2022-07-15T11:39:37Z) - True Few-Shot Learning with Language Models [78.42578316883271]
We evaluate the few-shot ability of LMs when held-out examples are unavailable.
Our findings suggest that prior work significantly overestimated the true few-shot ability of LMs.
arXiv Detail & Related papers (2021-05-24T17:55:51Z) - Making Pre-trained Language Models Better Few-shot Learners [11.90626040104822]
Recent GPT-3 model achieves remarkable few-shot performance solely by leveraging a natural-language prompt and a few task demonstrations as input context.
Inspired by their findings, we study few-shot learning in a more practical scenario, where we use smaller language models for which fine-tuning is computationally efficient.
We present LM-BFF--better few-shot fine-tuning of language models--a suite of simple and complementary techniques for fine-tuning language models on a small number of annotated examples.
arXiv Detail & Related papers (2020-12-31T17:21:26Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.