One size doesn't fit all: Predicting the Number of Examples for In-Context Learning
- URL: http://arxiv.org/abs/2403.06402v2
- Date: Fri, 18 Oct 2024 17:10:05 GMT
- Title: One size doesn't fit all: Predicting the Number of Examples for In-Context Learning
- Authors: Manish Chandra, Debasis Ganguly, Iadh Ounis,
- Abstract summary: In-context learning (ICL) refers to the process of adding a small number of localized examples (ones that are semantically similar to the input) to a training set of labelled data.
Our work alleviates the limitations of this 'one fits all' approach by dynamically predicting the number of examples for each data instance to be used in few-shot inference.
Our experiments on a number of text classification benchmarks show that AICL substantially outperforms standard ICL by up to 17%.
- Score: 16.712595387955574
- License:
- Abstract: In-context learning (ICL) refers to the process of adding a small number of localized examples (ones that are semantically similar to the input) from a training set of labelled data to an LLM's prompt with an objective to effectively control the generative process seeking to improve the downstream task performance. Existing ICL approaches use an identical number of examples (a pre-configured hyper-parameter) for each data instance. Our work alleviates the limitations of this 'one fits all' approach by dynamically predicting the number of examples for each data instance to be used in few-shot inference with LLMs. In particular, we employ a multi-label classifier, the parameters of which are fitted using a training set, where the label for each instance in the training set indicates if using a specific value of k (number of most similar examples from 0 up to a maximum value) leads to correct k-shot downstream predictions. Our experiments on a number of text classification benchmarks show that AICL substantially outperforms standard ICL by up to 17%.
Related papers
- Context-aware Prompt Tuning: Advancing In-Context Learning with Adversarial Methods [69.36397993451742]
This work introduces Context-aware Prompt Tuning (CPT), a method inspired by ICL, PT, and adversarial attacks.
We modify specific context tokens, considering the unique structure of input and output formats.
Inspired by adversarial attacks, we adjust the input based on the labels present in the context, focusing on minimizing, rather than maximizing, the loss.
arXiv Detail & Related papers (2024-10-22T17:45:47Z) - Prompt Optimization with EASE? Efficient Ordering-aware Automated Selection of Exemplars [66.823588073584]
Large language models (LLMs) have shown impressive capabilities in real-world applications.
The quality of these exemplars in the prompt greatly impacts performance.
Existing methods fail to adequately account for the impact of exemplar ordering on the performance.
arXiv Detail & Related papers (2024-05-25T08:23:05Z) - Experimental Design for Active Transductive Inference in Large Language Models [18.2671641610825]
We use active learning for adaptive prompt design and call it Active In-context Prompt Design (AIPD)
We design the LLM prompt by adaptively choosing few-shot examples from a training set to optimize performance on a test set.
We propose two algorithms, GO and SAL, which differ in how the few-shot examples are chosen.
arXiv Detail & Related papers (2024-04-12T23:27:46Z) - ParaICL: Towards Robust Parallel In-Context Learning [74.38022919598443]
Large language models (LLMs) have become the norm in natural language processing.
Few-shot in-context learning (ICL) relies on the choice of few-shot demonstration examples.
We propose a novel method named parallel in-context learning (ParaICL)
arXiv Detail & Related papers (2024-03-31T05:56:15Z) - Not All Demonstration Examples are Equally Beneficial: Reweighting
Demonstration Examples for In-Context Learning [32.29118942982609]
Large Language Models (LLMs) have recently gained the In-Context Learning (ICL) ability with the models scaling up.
This paper investigates how to determine approximately optimal weights for demonstration examples and how to apply them during ICL.
Experimental results on 8 text classification tasks show that our approach outperforms conventional ICL by a large margin.
arXiv Detail & Related papers (2023-10-12T13:15:11Z) - Estimating Large Language Model Capabilities without Labeled Test Data [51.428562302037534]
Large Language Models (LLMs) have the impressive ability to perform in-context learning (ICL) from only a few examples.
We propose the task of ICL accuracy estimation, in which we predict the accuracy of an LLM when doing in-context learning on a new task.
arXiv Detail & Related papers (2023-05-24T06:55:09Z) - Data Curation Alone Can Stabilize In-context Learning [20.874674130060388]
In-context learning (ICL) enables large language models to perform new tasks by prompting them with a sequence of training examples.
randomly sampling examples from a training set leads to high variance in performance.
We show that carefully curating a subset of training data greatly stabilizes ICL performance without any other changes to the ICL algorithm.
arXiv Detail & Related papers (2022-12-20T15:58:54Z) - Dash: Semi-Supervised Learning with Dynamic Thresholding [72.74339790209531]
We propose a semi-supervised learning (SSL) approach that uses unlabeled examples to train models.
Our proposed approach, Dash, enjoys its adaptivity in terms of unlabeled data selection.
arXiv Detail & Related papers (2021-09-01T23:52:29Z) - Contrastive Learning with Adversarial Examples [79.39156814887133]
Contrastive learning (CL) is a popular technique for self-supervised learning (SSL) of visual representations.
This paper introduces a new family of adversarial examples for constrastive learning and using these examples to define a new adversarial training algorithm for SSL, denoted as CLAE.
arXiv Detail & Related papers (2020-10-22T20:45:10Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.