Few-shot learning for sentence pair classification and its applications
in software engineering
- URL: http://arxiv.org/abs/2306.08058v1
- Date: Tue, 13 Jun 2023 18:23:52 GMT
- Title: Few-shot learning for sentence pair classification and its applications
in software engineering
- Authors: Robert Kraig Helmeczi, Mucahit Cevik, Savas Y{\i}ld{\i}r{\i}m
- Abstract summary: This work is to investigate the performance of alternative few-shot learning approaches with BERT-based models.
vanilla fine-tuning, PET and SetFit are compared for numerous BERT-based checkpoints over an array of training set sizes.
Our results establish PET as a strong few-shot learning approach, and our analysis shows that with just a few hundred labeled examples it can achieve performance near that of fine-tuning on full-sized data sets.
- Score: 0.36832029288386137
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Few-shot learning-the ability to train models with access to limited data-has
become increasingly popular in the natural language processing (NLP) domain, as
large language models such as GPT and T0 have been empirically shown to achieve
high performance in numerous tasks with access to just a handful of labeled
examples. Smaller language models such as BERT and its variants have also been
shown to achieve strong performance with just a handful of labeled examples
when combined with few-shot learning algorithms like pattern-exploiting
training (PET) and SetFit. The focus of this work is to investigate the
performance of alternative few-shot learning approaches with BERT-based models.
Specifically, vanilla fine-tuning, PET and SetFit are compared for numerous
BERT-based checkpoints over an array of training set sizes. To facilitate this
investigation, applications of few-shot learning are considered in software
engineering. For each task, high-performance techniques and their associated
model checkpoints are identified through detailed empirical analysis. Our
results establish PET as a strong few-shot learning approach, and our analysis
shows that with just a few hundred labeled examples it can achieve performance
near that of fine-tuning on full-sized data sets.
Related papers
- EmbedLLM: Learning Compact Representations of Large Language Models [28.49433308281983]
We propose EmbedLLM, a framework designed to learn compact vector representations of Large Language Models.
We introduce an encoder-decoder approach for learning such embeddings, along with a systematic framework to evaluate their effectiveness.
Empirical results show that EmbedLLM outperforms prior methods in model routing both in accuracy and latency.
arXiv Detail & Related papers (2024-10-03T05:43:24Z) - Language models are weak learners [71.33837923104808]
We show that prompt-based large language models can operate effectively as weak learners.
We incorporate these models into a boosting approach, which can leverage the knowledge within the model to outperform traditional tree-based boosting.
Results illustrate the potential for prompt-based LLMs to function not just as few-shot learners themselves, but as components of larger machine learning pipelines.
arXiv Detail & Related papers (2023-06-25T02:39:19Z) - Active Learning Principles for In-Context Learning with Large Language
Models [65.09970281795769]
This paper investigates how Active Learning algorithms can serve as effective demonstration selection methods for in-context learning.
We show that in-context example selection through AL prioritizes high-quality examples that exhibit low uncertainty and bear similarity to the test examples.
arXiv Detail & Related papers (2023-05-23T17:16:04Z) - Improving Few-Shot Performance of Language Models via Nearest Neighbor
Calibration [12.334422701057674]
We propose a novel nearest-neighbor calibration framework for in-context learning.
It is inspired by a phenomenon that the in-context learning paradigm produces incorrect labels when inferring training instances.
Experiments on various few-shot text classification tasks demonstrate that our method significantly improves in-context learning.
arXiv Detail & Related papers (2022-12-05T12:49:41Z) - Learning New Tasks from a Few Examples with Soft-Label Prototypes [18.363177410917597]
We propose a novel few-shot learning approach based on soft-label prototypes (SLPs)
We focus on learning previously unseen NLP tasks from very few examples (4, 8, 16) per class.
We experimentally demonstrate that our approach achieves superior performance on the majority of tested tasks in this data-lean setting.
arXiv Detail & Related papers (2022-10-31T16:06:48Z) - Generalization Properties of Retrieval-based Models [50.35325326050263]
Retrieval-based machine learning methods have enjoyed success on a wide range of problems.
Despite growing literature showcasing the promise of these models, the theoretical underpinning for such models remains underexplored.
We present a formal treatment of retrieval-based models to characterize their generalization ability.
arXiv Detail & Related papers (2022-10-06T00:33:01Z) - PaLM: Scaling Language Modeling with Pathways [180.69584031908113]
We trained a 540-billion parameter, densely activated, Transformer language model, which we call Pathways Language Model PaLM.
We trained PaLM on 6144 TPU v4 chips using Pathways, a new ML system which enables highly efficient training across multiple TPU Pods.
We demonstrate continued benefits of scaling by achieving state-of-the-art few-shot learning results on hundreds of language understanding and generation benchmarks.
arXiv Detail & Related papers (2022-04-05T16:11:45Z) - LegaLMFiT: Efficient Short Legal Text Classification with LSTM Language
Model Pre-Training [0.0]
Large Transformer-based language models such as BERT have led to broad performance improvements on many NLP tasks.
In legal NLP, BERT-based models have led to new state-of-the-art results on multiple tasks.
We show that lightweight LSTM-based Language Models are able to capture enough information from a small legal text pretraining corpus and achieve excellent performance on short legal text classification tasks.
arXiv Detail & Related papers (2021-09-02T14:45:04Z) - Few-shot Sequence Learning with Transformers [79.87875859408955]
Few-shot algorithms aim at learning new tasks provided only a handful of training examples.
In this work we investigate few-shot learning in the setting where the data points are sequences of tokens.
We propose an efficient learning algorithm based on Transformers.
arXiv Detail & Related papers (2020-12-17T12:30:38Z) - Fine-tuning BERT for Low-Resource Natural Language Understanding via
Active Learning [30.5853328612593]
In this work, we explore fine-tuning methods of BERT -- a pre-trained Transformer based language model.
Our experimental results show an advantage in model performance by maximizing the approximate knowledge gain of the model.
We analyze the benefits of freezing layers of the language model during fine-tuning to reduce the number of trainable parameters.
arXiv Detail & Related papers (2020-12-04T08:34:39Z) - Language Models are Few-Shot Learners [61.36677350504291]
We show that scaling up language models greatly improves task-agnostic, few-shot performance.
We train GPT-3, an autoregressive language model with 175 billion parameters, and test its performance in the few-shot setting.
GPT-3 achieves strong performance on many NLP datasets, including translation, question-answering, and cloze tasks.
arXiv Detail & Related papers (2020-05-28T17:29:03Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.