Zero-Shot Text Classification via Self-Supervised Tuning
- URL: http://arxiv.org/abs/2305.11442v2
- Date: Thu, 25 May 2023 06:10:04 GMT
- Title: Zero-Shot Text Classification via Self-Supervised Tuning
- Authors: Chaoqun Liu, Wenxuan Zhang, Guizhen Chen, Xiaobao Wu, Anh Tuan Luu,
Chip Hong Chang, Lidong Bing
- Abstract summary: We propose a new paradigm based on self-supervised learning to solve zero-shot text classification tasks.
tuning the language models with unlabeled data, called self-supervised tuning.
Our model outperforms the state-of-the-art baselines on 7 out of 10 tasks.
- Score: 46.9902502503747
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Existing solutions to zero-shot text classification either conduct prompting
with pre-trained language models, which is sensitive to the choices of
templates, or rely on large-scale annotated data of relevant tasks for
meta-tuning. In this work, we propose a new paradigm based on self-supervised
learning to solve zero-shot text classification tasks by tuning the language
models with unlabeled data, called self-supervised tuning. By exploring the
inherent structure of free texts, we propose a new learning objective called
first sentence prediction to bridge the gap between unlabeled data and text
classification tasks. After tuning the model to learn to predict the first
sentence in a paragraph based on the rest, the model is able to conduct
zero-shot inference on unseen tasks such as topic classification and sentiment
analysis. Experimental results show that our model outperforms the
state-of-the-art baselines on 7 out of 10 tasks. Moreover, the analysis reveals
that our model is less sensitive to the prompt design. Our code and pre-trained
models are publicly available at https://github.com/DAMO-NLP-SG/SSTuning .
Related papers
- Ensembling Finetuned Language Models for Text Classification [55.15643209328513]
Finetuning is a common practice across different communities to adapt pretrained models to particular tasks.
ensembles of neural networks are typically used to boost performance and provide reliable uncertainty estimates.
We present a metadataset with predictions from five large finetuned models on six datasets and report results of different ensembling strategies.
arXiv Detail & Related papers (2024-10-25T09:15:54Z) - Self-Supervised Representation Learning for Online Handwriting Text
Classification [0.8594140167290099]
We propose the novel Part of Stroke Masking (POSM) as a pretext task for pretraining models to extract informative representations from the online handwriting of individuals in English and Chinese languages.
To evaluate the quality of the extracted representations, we use both intrinsic and extrinsic evaluation methods.
The pretrained models are fine-tuned to achieve state-of-the-art results in tasks such as writer identification, gender classification, and handedness classification.
arXiv Detail & Related papers (2023-10-10T14:07:49Z) - POUF: Prompt-oriented unsupervised fine-tuning for large pre-trained
models [62.23255433487586]
We propose an unsupervised fine-tuning framework to fine-tune the model or prompt on the unlabeled target data.
We demonstrate how to apply our method to both language-augmented vision and masked-language models by aligning the discrete distributions extracted from the prompts and target data.
arXiv Detail & Related papers (2023-04-29T22:05:22Z) - Zero-Shot Text Classification with Self-Training [8.68603153534916]
We show that fine-tuning the zero-shot classifier on its most confident predictions leads to significant performance gains across a wide range of text classification tasks.
Self-training adapts the zero-shot model to the task at hand.
arXiv Detail & Related papers (2022-10-31T17:55:00Z) - A Unified Understanding of Deep NLP Models for Text Classification [88.35418976241057]
We have developed a visual analysis tool, DeepNLPVis, to enable a unified understanding of NLP models for text classification.
The key idea is a mutual information-based measure, which provides quantitative explanations on how each layer of a model maintains the information of input words in a sample.
A multi-level visualization, which consists of a corpus-level, a sample-level, and a word-level visualization, supports the analysis from the overall training set to individual samples.
arXiv Detail & Related papers (2022-06-19T08:55:07Z) - Language Models in the Loop: Incorporating Prompting into Weak
Supervision [11.10422546502386]
We propose a new strategy for applying large pre-trained language models to novel tasks when labeled training data is limited.
Instead of applying the model in a typical zero-shot or few-shot fashion, we treat the model as the basis for labeling functions in a weak supervision framework.
arXiv Detail & Related papers (2022-05-04T20:42:40Z) - ShufText: A Simple Black Box Approach to Evaluate the Fragility of Text
Classification Models [0.0]
Deep learning approaches based on CNN, LSTM, and Transformers have been the de facto approach for text classification.
We show that these systems are over-reliant on the important words present in the text that are useful for classification.
arXiv Detail & Related papers (2021-01-30T15:18:35Z) - Cold-start Active Learning through Self-supervised Language Modeling [15.551710499866239]
Active learning aims to reduce annotation costs by choosing the most critical examples to label.
With BERT, we develop a simple strategy based on the masked language modeling loss.
Compared to other baselines, our approach reaches higher accuracy within less sampling iterations and time.
arXiv Detail & Related papers (2020-10-19T14:09:17Z) - Uncertainty-aware Self-training for Text Classification with Few Labels [54.13279574908808]
We study self-training as one of the earliest semi-supervised learning approaches to reduce the annotation bottleneck.
We propose an approach to improve self-training by incorporating uncertainty estimates of the underlying neural network.
We show our methods leveraging only 20-30 labeled samples per class for each task for training and for validation can perform within 3% of fully supervised pre-trained language models.
arXiv Detail & Related papers (2020-06-27T08:13:58Z) - Document Ranking with a Pretrained Sequence-to-Sequence Model [56.44269917346376]
We show how a sequence-to-sequence model can be trained to generate relevance labels as "target words"
Our approach significantly outperforms an encoder-only model in a data-poor regime.
arXiv Detail & Related papers (2020-03-14T22:29:50Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.