'One size doesn't fit all': Learning how many Examples to use for
In-Context Learning for Improved Text Classification
- URL: http://arxiv.org/abs/2403.06402v1
- Date: Mon, 11 Mar 2024 03:28:13 GMT
- Title: 'One size doesn't fit all': Learning how many Examples to use for
In-Context Learning for Improved Text Classification
- Authors: Manish Chandra, Debasis Ganguly, Yiwen Li, Iadh Ounis
- Abstract summary: In-context learning (ICL) uses a small number of labelled data instances as examples in the prompt.
We propose a novel methodology of dynamically adapting the number of examples as per the data.
Our experiments show that our AICL method results in improvement in text classification task on several standard datasets.
- Score: 18.167541508658417
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Predictive models in natural language processing (NLP) have evolved from
training models from scratch to fine-tuning pre-trained models with labelled
data. An extreme form of this fine-tuning involves in-context learning (ICL),
where the output of a pre-trained generative model (frozen decoder parameters)
is controlled only with variations in the input strings (called instructions or
prompts). An important component of ICL is the use of a small number of
labelled data instances as examples in the prompt. While existing work uses a
static number of examples during inference for each data instance, in this
paper we propose a novel methodology of dynamically adapting the number of
examples as per the data. This is analogous to the use of a variable-sized
neighborhood in k-nearest neighbors (k-NN) classifier. In our proposed workflow
of adaptive ICL (AICL), the number of demonstrations to employ during the
inference on a particular data instance is predicted by the Softmax posteriors
of a classifier. The parameters of this classifier are fitted on the optimal
number of examples in ICL required to correctly infer the label of each
instance in the training set with the hypothesis that a test instance that is
similar to a training instance should use the same (or a closely matching)
number of few-shot examples. Our experiments show that our AICL method results
in improvement in text classification task on several standard datasets.
Related papers
- A Closer Look at Benchmarking Self-Supervised Pre-training with Image Classification [51.35500308126506]
Self-supervised learning (SSL) is a machine learning approach where the data itself provides supervision, eliminating the need for external labels.
We study how classification-based evaluation protocols for SSL correlate and how well they predict downstream performance on different dataset types.
arXiv Detail & Related papers (2024-07-16T23:17:36Z) - Prompt Optimization with EASE? Efficient Ordering-aware Automated Selection of Exemplars [66.823588073584]
Large language models (LLMs) have shown impressive capabilities in real-world applications.
The quality of these exemplars in the prompt greatly impacts performance.
Existing methods fail to adequately account for the impact of exemplar ordering on the performance.
arXiv Detail & Related papers (2024-05-25T08:23:05Z) - ParaICL: Towards Robust Parallel In-Context Learning [74.38022919598443]
Large language models (LLMs) have become the norm in natural language processing.
Few-shot in-context learning (ICL) relies on the choice of few-shot demonstration examples.
We propose a novel method named parallel in-context learning (ParaICL)
arXiv Detail & Related papers (2024-03-31T05:56:15Z) - Unsupervised Calibration through Prior Adaptation for Text
Classification using Large Language Models [37.39843935632105]
We propose an approach to adapt the prior class distribution to perform text classification tasks without the need for labelled samples.
Results show that these methods outperform the un-adapted model for different number of training shots in the prompt.
arXiv Detail & Related papers (2023-07-13T12:11:36Z) - Data Curation Alone Can Stabilize In-context Learning [20.874674130060388]
In-context learning (ICL) enables large language models to perform new tasks by prompting them with a sequence of training examples.
randomly sampling examples from a training set leads to high variance in performance.
We show that carefully curating a subset of training data greatly stabilizes ICL performance without any other changes to the ICL algorithm.
arXiv Detail & Related papers (2022-12-20T15:58:54Z) - Improving Few-Shot Performance of Language Models via Nearest Neighbor
Calibration [12.334422701057674]
We propose a novel nearest-neighbor calibration framework for in-context learning.
It is inspired by a phenomenon that the in-context learning paradigm produces incorrect labels when inferring training instances.
Experiments on various few-shot text classification tasks demonstrate that our method significantly improves in-context learning.
arXiv Detail & Related papers (2022-12-05T12:49:41Z) - Dash: Semi-Supervised Learning with Dynamic Thresholding [72.74339790209531]
We propose a semi-supervised learning (SSL) approach that uses unlabeled examples to train models.
Our proposed approach, Dash, enjoys its adaptivity in terms of unlabeled data selection.
arXiv Detail & Related papers (2021-09-01T23:52:29Z) - An Empirical Comparison of Instance Attribution Methods for NLP [62.63504976810927]
We evaluate the degree to which different potential instance attribution agree with respect to the importance of training samples.
We find that simple retrieval methods yield training instances that differ from those identified via gradient-based methods.
arXiv Detail & Related papers (2021-04-09T01:03:17Z) - Few-shot learning through contextual data augmentation [74.20290390065475]
Machine translation models need to adapt to new data to maintain their performance over time.
We show that adaptation on the scale of one to five examples is possible.
Our model reports better accuracy scores than a reference system trained with on average 313 parallel examples.
arXiv Detail & Related papers (2021-03-31T09:05:43Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.