Automatically Identifying Words That Can Serve as Labels for Few-Shot
Text Classification
- URL: http://arxiv.org/abs/2010.13641v1
- Date: Mon, 26 Oct 2020 14:56:22 GMT
- Title: Automatically Identifying Words That Can Serve as Labels for Few-Shot
Text Classification
- Authors: Timo Schick, Helmut Schmid, Hinrich Sch\"utze
- Abstract summary: A recent approach for few-shot text classification is to convert textual inputs to cloze questions that contain some form of task description, process them with a pretrained language model and map the predicted words to labels.
To mitigate this issue, we devise an approach that automatically finds such a mapping given small amounts of training data.
For a number of tasks, the mapping found by our approach performs almost as well as hand-crafted label-to-word mappings.
- Score: 12.418532541734193
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: A recent approach for few-shot text classification is to convert textual
inputs to cloze questions that contain some form of task description, process
them with a pretrained language model and map the predicted words to labels.
Manually defining this mapping between words and labels requires both domain
expertise and an understanding of the language model's abilities. To mitigate
this issue, we devise an approach that automatically finds such a mapping given
small amounts of training data. For a number of tasks, the mapping found by our
approach performs almost as well as hand-crafted label-to-word mappings.
Related papers
- Description-Enhanced Label Embedding Contrastive Learning for Text
Classification [65.01077813330559]
Self-Supervised Learning (SSL) in model learning process and design a novel self-supervised Relation of Relation (R2) classification task.
Relation of Relation Learning Network (R2-Net) for text classification, in which text classification and R2 classification are treated as optimization targets.
external knowledge from WordNet to obtain multi-aspect descriptions for label semantic learning.
arXiv Detail & Related papers (2023-06-15T02:19:34Z) - The Benefits of Label-Description Training for Zero-Shot Text
Classification [35.27224341685012]
Pretrained language models have improved zero-shot text classification.
We propose a simple way to further improve zero-shot accuracies with minimal effort.
arXiv Detail & Related papers (2023-05-03T16:19:31Z) - Exploring Structured Semantic Prior for Multi Label Recognition with
Incomplete Labels [60.675714333081466]
Multi-label recognition (MLR) with incomplete labels is very challenging.
Recent works strive to explore the image-to-label correspondence in the vision-language model, ie, CLIP, to compensate for insufficient annotations.
We advocate remedying the deficiency of label supervision for the MLR with incomplete labels by deriving a structured semantic prior.
arXiv Detail & Related papers (2023-03-23T12:39:20Z) - Active Learning and Multi-label Classification for Ellipsis and
Coreference Detection in Conversational Question-Answering [5.984693203400407]
ellipsis and coreferences are commonly occurring linguistic phenomena.
We propose to use a multi-label classifier based on DistilBERT.
We show that these methods greatly enhance the performance of the classifier for detecting these phenomena on a manually labeled dataset.
arXiv Detail & Related papers (2022-07-07T08:14:54Z) - Label Semantic Aware Pre-training for Few-shot Text Classification [53.80908620663974]
We propose Label Semantic Aware Pre-training (LSAP) to improve the generalization and data efficiency of text classification systems.
LSAP incorporates label semantics into pre-trained generative models (T5 in our case) by performing secondary pre-training on labeled sentences from a variety of domains.
arXiv Detail & Related papers (2022-04-14T17:33:34Z) - Automatic Multi-Label Prompting: Simple and Interpretable Few-Shot
Classification [15.575483080819563]
We propose Automatic Multi-Label Prompting (AMuLaP) to automatically select label mappings for few-shot text classification with prompting.
Our method exploits one-to-many label mappings and a statistics-based algorithm to select label mappings given a prompt template.
Our experiments demonstrate that AMuLaP achieves competitive performance on the GLUE benchmark without human effort or external resources.
arXiv Detail & Related papers (2022-04-13T11:15:52Z) - BERT-Assisted Semantic Annotation Correction for Emotion-Related
Questions [0.0]
We use the BERT neural language model to feed information back into an annotation task in a question-asking game called Emotion Twenty Questions (EMO20Q)
We show this method to be an effective way to assess and revise annotations of textual user data with complex, utterance-level semantic labels.
arXiv Detail & Related papers (2022-04-02T18:00:49Z) - Prompt-Learning for Short Text Classification [30.53216712864025]
In short text, the extreme short length, feature sparsity and high ambiguity pose huge challenge to classification tasks.
In this paper, we propose a simple short text classification approach that makes use of prompt-learning based on knowledgeable expansion.
arXiv Detail & Related papers (2022-02-23T08:07:06Z) - UCPhrase: Unsupervised Context-aware Quality Phrase Tagging [63.86606855524567]
UCPhrase is a novel unsupervised context-aware quality phrase tagger.
We induce high-quality phrase spans as silver labels from consistently co-occurring word sequences.
We show that our design is superior to state-of-the-art pre-trained, unsupervised, and distantly supervised methods.
arXiv Detail & Related papers (2021-05-28T19:44:24Z) - Text Classification Using Label Names Only: A Language Model
Self-Training Approach [80.63885282358204]
Current text classification methods typically require a good number of human-labeled documents as training data.
We show that our model achieves around 90% accuracy on four benchmark datasets including topic and sentiment classification.
arXiv Detail & Related papers (2020-10-14T17:06:41Z) - Adaptive Self-training for Few-shot Neural Sequence Labeling [55.43109437200101]
We develop techniques to address the label scarcity challenge for neural sequence labeling models.
Self-training serves as an effective mechanism to learn from large amounts of unlabeled data.
meta-learning helps in adaptive sample re-weighting to mitigate error propagation from noisy pseudo-labels.
arXiv Detail & Related papers (2020-10-07T22:29:05Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.