Conformal Predictor for Improving Zero-shot Text Classification
Efficiency
- URL: http://arxiv.org/abs/2210.12619v1
- Date: Sun, 23 Oct 2022 05:19:50 GMT
- Title: Conformal Predictor for Improving Zero-shot Text Classification
Efficiency
- Authors: Prafulla Kumar Choubey, Yu Bai, Chien-Sheng Wu, Wenhao Liu, Nazneen
Rajani
- Abstract summary: We reduce the average inference time for NLI- and NSP-based models by 25.6% and 22.2% respectively.
With a suitable CP for each dataset, we reduce the average inference time for NLI- and NSP-based models by 25.6% and 22.2% respectively.
- Score: 37.745518881553416
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Pre-trained language models (PLMs) have been shown effective for zero-shot
(0shot) text classification. 0shot models based on natural language inference
(NLI) and next sentence prediction (NSP) employ cross-encoder architecture and
infer by making a forward pass through the model for each label-text pair
separately. This increases the computational cost to make inferences linearly
in the number of labels. In this work, we improve the efficiency of such
cross-encoder-based 0shot models by restricting the number of likely labels
using another fast base classifier-based conformal predictor (CP) calibrated on
samples labeled by the 0shot model. Since a CP generates prediction sets with
coverage guarantees, it reduces the number of target labels without excluding
the most probable label based on the 0shot model. We experiment with three
intent and two topic classification datasets. With a suitable CP for each
dataset, we reduce the average inference time for NLI- and NSP-based models by
25.6% and 22.2% respectively, without dropping performance below the predefined
error rate of 1%.
Related papers
- Adapting Conformal Prediction to Distribution Shifts Without Labels [16.478151550456804]
Conformal prediction (CP) enables machine learning models to output prediction sets with guaranteed coverage rate.
Our goal is to improve the quality of CP-generated prediction sets using only unlabeled data from the test domain.
This is achieved by two new methods called ECP and EACP, that adjust the score function in CP according to the base model's uncertainty on the unlabeled test data.
arXiv Detail & Related papers (2024-06-03T15:16:02Z) - Transductive Zero-Shot and Few-Shot CLIP [24.592841797020203]
This paper addresses the transductive zero-shot and few-shot CLIP classification challenge.
Inference is performed jointly across a mini-batch of unlabeled query samples, rather than treating each instance independently.
Our approach yields near 20% improvement in ImageNet accuracy over CLIP's zero-shot performance.
arXiv Detail & Related papers (2024-04-08T12:44:31Z) - Unifying Token and Span Level Supervisions for Few-Shot Sequence
Labeling [18.24907067631541]
Few-shot sequence labeling aims to identify novel classes based on only a few labeled samples.
We propose a Consistent Dual Adaptive Prototypical (CDAP) network for few-shot sequence labeling.
Our model achieves new state-of-the-art results on three benchmark datasets.
arXiv Detail & Related papers (2023-07-16T04:50:52Z) - ProTeCt: Prompt Tuning for Taxonomic Open Set Classification [59.59442518849203]
Few-shot adaptation methods do not fare well in the taxonomic open set (TOS) setting.
We propose a prompt tuning technique that calibrates the hierarchical consistency of model predictions.
A new Prompt Tuning for Hierarchical Consistency (ProTeCt) technique is then proposed to calibrate classification across label set granularities.
arXiv Detail & Related papers (2023-06-04T02:55:25Z) - Semi-Supervised Learning with Pseudo-Negative Labels for Image
Classification [14.100569951592417]
We propose a mutual learning framework based on pseudo-negative labels.
By reducing the prediction probability on pseudo-negative labels, the dual model can improve its prediction ability.
Our framework achieves state-of-the-art results on several main benchmarks.
arXiv Detail & Related papers (2023-01-10T14:15:17Z) - Improving Zero-Shot Models with Label Distribution Priors [33.51714665243138]
We propose a new approach, CLIPPR, which adapts zero-shot models for regression and classification on unlabelled datasets.
We demonstrate an improvement of 28% in mean absolute error on the UTK age regression task.
We also present promising results for classification benchmarks, improving the classification accuracy on the ImageNet dataset by 2.83%, without using any labels.
arXiv Detail & Related papers (2022-12-01T18:59:03Z) - ADT-SSL: Adaptive Dual-Threshold for Semi-Supervised Learning [68.53717108812297]
Semi-Supervised Learning (SSL) has advanced classification tasks by inputting both labeled and unlabeled data to train a model jointly.
This paper proposes an Adaptive Dual-Threshold method for Semi-Supervised Learning (ADT-SSL)
Experimental results show that the proposed ADT-SSL achieves state-of-the-art classification accuracy.
arXiv Detail & Related papers (2022-05-21T11:52:08Z) - Cross-Model Pseudo-Labeling for Semi-Supervised Action Recognition [98.25592165484737]
We propose a more effective pseudo-labeling scheme, called Cross-Model Pseudo-Labeling (CMPL)
CMPL achieves $17.6%$ and $25.1%$ Top-1 accuracy on Kinetics-400 and UCF-101 using only the RGB modality and $1%$ labeled data, respectively.
arXiv Detail & Related papers (2021-12-17T18:59:41Z) - Delving Deep into Label Smoothing [112.24527926373084]
Label smoothing is an effective regularization tool for deep neural networks (DNNs)
We present an Online Label Smoothing (OLS) strategy, which generates soft labels based on the statistics of the model prediction for the target category.
arXiv Detail & Related papers (2020-11-25T08:03:11Z) - Uncertainty-aware Self-training for Text Classification with Few Labels [54.13279574908808]
We study self-training as one of the earliest semi-supervised learning approaches to reduce the annotation bottleneck.
We propose an approach to improve self-training by incorporating uncertainty estimates of the underlying neural network.
We show our methods leveraging only 20-30 labeled samples per class for each task for training and for validation can perform within 3% of fully supervised pre-trained language models.
arXiv Detail & Related papers (2020-06-27T08:13:58Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.