Unsupervised Calibration through Prior Adaptation for Text
Classification using Large Language Models
- URL: http://arxiv.org/abs/2307.06713v3
- Date: Wed, 9 Aug 2023 07:40:43 GMT
- Title: Unsupervised Calibration through Prior Adaptation for Text
Classification using Large Language Models
- Authors: Lautaro Estienne, Luciana Ferrer, Mat\'ias Vera, Pablo Piantanida
- Abstract summary: We propose an approach to adapt the prior class distribution to perform text classification tasks without the need for labelled samples.
Results show that these methods outperform the un-adapted model for different number of training shots in the prompt.
- Score: 37.39843935632105
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: A wide variety of natural language tasks are currently being addressed with
large-scale language models (LLMs). These models are usually trained with a
very large amount of unsupervised text data and adapted to perform a downstream
natural language task using methods like fine-tuning, calibration or in-context
learning. In this work, we propose an approach to adapt the prior class
distribution to perform text classification tasks without the need for labelled
samples and only few in-domain sample queries. The proposed approach treats the
LLM as a black box, adding a stage where the model posteriors are calibrated to
the task. Results show that these methods outperform the un-adapted model for
different number of training shots in the prompt and a previous approach were
calibration is performed without using any adaptation data.
Related papers
- Ensembling Finetuned Language Models for Text Classification [55.15643209328513]
Finetuning is a common practice across different communities to adapt pretrained models to particular tasks.
ensembles of neural networks are typically used to boost performance and provide reliable uncertainty estimates.
We present a metadataset with predictions from five large finetuned models on six datasets and report results of different ensembling strategies.
arXiv Detail & Related papers (2024-10-25T09:15:54Z) - Data Adaptive Traceback for Vision-Language Foundation Models in Image Classification [34.37262622415682]
We propose a new adaptation framework called Data Adaptive Traceback.
Specifically, we utilize a zero-shot-based method to extract the most downstream task-related subset of the pre-training data.
We adopt a pseudo-label-based semi-supervised technique to reuse the pre-training images and a vision-language contrastive learning method to address the confirmation bias issue in semi-supervised learning.
arXiv Detail & Related papers (2024-07-11T18:01:58Z) - Irreducible Curriculum for Language Model Pretraining [46.895234111411426]
We propose irreducible curriculum as a curriculum learning algorithm for language model pretraining.
Our experiments on the RedPajama-1B dataset demonstrate a consistent improvement on validation perplexity across all 7 domains.
arXiv Detail & Related papers (2023-10-23T22:41:33Z) - Preserving Pre-trained Features Helps Calibrate Fine-tuned Language
Models [23.881825575095945]
Large pre-trained language models (PLMs) have demonstrated strong performance on natural language understanding (NLU) tasks through fine-tuning.
However, fine-tuned models still suffer from overconfident predictions, especially in out-of-domain settings.
We demonstrate that the PLMs are well-calibrated on the masked language modeling task with robust predictive confidence under domain shift.
We show that preserving pre-trained features can improve the calibration of fine-tuned language models.
arXiv Detail & Related papers (2023-05-30T17:35:31Z) - Improving Few-Shot Performance of Language Models via Nearest Neighbor
Calibration [12.334422701057674]
We propose a novel nearest-neighbor calibration framework for in-context learning.
It is inspired by a phenomenon that the in-context learning paradigm produces incorrect labels when inferring training instances.
Experiments on various few-shot text classification tasks demonstrate that our method significantly improves in-context learning.
arXiv Detail & Related papers (2022-12-05T12:49:41Z) - Improving Pre-trained Language Model Fine-tuning with Noise Stability
Regularization [94.4409074435894]
We propose a novel and effective fine-tuning framework, named Layerwise Noise Stability Regularization (LNSR)
Specifically, we propose to inject the standard Gaussian noise and regularize hidden representations of the fine-tuned model.
We demonstrate the advantages of the proposed method over other state-of-the-art algorithms including L2-SP, Mixout and SMART.
arXiv Detail & Related papers (2022-06-12T04:42:49Z) - Bayesian Active Learning with Pretrained Language Models [9.161353418331245]
Active Learning (AL) is a method to iteratively select data for annotation from a pool of unlabeled data.
Previous AL approaches have been limited to task-specific models that are trained from scratch at each iteration.
We introduce BALM; Bayesian Active Learning with pretrained language models.
arXiv Detail & Related papers (2021-04-16T19:07:31Z) - Active Learning for Sequence Tagging with Deep Pre-trained Models and
Bayesian Uncertainty Estimates [52.164757178369804]
Recent advances in transfer learning for natural language processing in conjunction with active learning open the possibility to significantly reduce the necessary annotation budget.
We conduct an empirical study of various Bayesian uncertainty estimation methods and Monte Carlo dropout options for deep pre-trained models in the active learning framework.
We also demonstrate that to acquire instances during active learning, a full-size Transformer can be substituted with a distilled version, which yields better computational performance.
arXiv Detail & Related papers (2021-01-20T13:59:25Z) - Pre-training Is (Almost) All You Need: An Application to Commonsense
Reasoning [61.32992639292889]
Fine-tuning of pre-trained transformer models has become the standard approach for solving common NLP tasks.
We introduce a new scoring method that casts a plausibility ranking task in a full-text format.
We show that our method provides a much more stable training phase across random restarts.
arXiv Detail & Related papers (2020-04-29T10:54:40Z) - Don't Stop Pretraining: Adapt Language Models to Domains and Tasks [81.99843216550306]
We present a study across four domains (biomedical and computer science publications, news, and reviews) and eight classification tasks.
A second phase of pretraining in-domain (domain-adaptive pretraining) leads to performance gains.
Adapting to the task's unlabeled data (task-adaptive pretraining) improves performance even after domain-adaptive pretraining.
arXiv Detail & Related papers (2020-04-23T04:21:19Z) - Train No Evil: Selective Masking for Task-Guided Pre-Training [97.03615486457065]
We propose a three-stage framework by adding a task-guided pre-training stage with selective masking between general pre-training and fine-tuning.
We show that our method can achieve comparable or even better performance with less than 50% of cost.
arXiv Detail & Related papers (2020-04-21T03:14:22Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.