Related papers: Text Classification Using Label Names Only: A Language Model Self-Training Approach

Text Classification Using Label Names Only: A Language Model Self-Training Approach

URL: http://arxiv.org/abs/2010.07245v1
Date: Wed, 14 Oct 2020 17:06:41 GMT
Title: Text Classification Using Label Names Only: A Language Model Self-Training Approach
Authors: Yu Meng, Yunyi Zhang, Jiaxin Huang, Chenyan Xiong, Heng Ji, Chao Zhang, Jiawei Han
Abstract summary: Current text classification methods typically require a good number of human-labeled documents as training data. We show that our model achieves around 90% accuracy on four benchmark datasets including topic and sentiment classification.
Score: 80.63885282358204
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Current text classification methods typically require a good number of human-labeled documents as training data, which can be costly and difficult to obtain in real applications. Humans can perform classification without seeing any labeled examples but only based on a small set of words describing the categories to be classified. In this paper, we explore the potential of only using the label name of each class to train classification models on unlabeled data, without using any labeled documents. We use pre-trained neural language models both as general linguistic knowledge sources for category understanding and as representation learning models for document classification. Our method (1) associates semantically related words with the label names, (2) finds category-indicative words and trains the model to predict their implied categories, and (3) generalizes the model via self-training. We show that our model achieves around 90% accuracy on four benchmark datasets including topic and sentiment classification without using any labeled documents but learning from unlabeled data supervised by at most 3 words (1 in most cases) per class as the label name.

Related papers

Towards Open-Domain Topic Classification [69.21234350688098]
We introduce an open-domain topic classification system that accepts user-defined taxonomy in real time. Users will be able to classify a text snippet with respect to any candidate labels they want, and get instant response from our web interface.
arXiv Detail & Related papers (2023-06-29T20:25:28Z)
Description-Enhanced Label Embedding Contrastive Learning for Text Classification [65.01077813330559]
Self-Supervised Learning (SSL) in model learning process and design a novel self-supervised Relation of Relation (R2) classification task. Relation of Relation Learning Network (R2-Net) for text classification, in which text classification and R2 classification are treated as optimization targets. external knowledge from WordNet to obtain multi-aspect descriptions for label semantic learning.
arXiv Detail & Related papers (2023-06-15T02:19:34Z)
The Benefits of Label-Description Training for Zero-Shot Text Classification [35.27224341685012]
Pretrained language models have improved zero-shot text classification. We propose a simple way to further improve zero-shot accuracies with minimal effort.
arXiv Detail & Related papers (2023-05-03T16:19:31Z)
FastClass: A Time-Efficient Approach to Weakly-Supervised Text Classification [14.918600168973564]
This paper proposes FastClass, an efficient weakly-supervised classification approach. It uses dense text representation to retrieve class-relevant documents from external unlabeled corpus. Experiments show that the proposed approach frequently outperforms keyword-driven models in terms of classification accuracy and often enjoys orders-of-magnitude faster training speed.
arXiv Detail & Related papers (2022-12-11T13:43:22Z)
Label Semantic Aware Pre-training for Few-shot Text Classification [53.80908620663974]
We propose Label Semantic Aware Pre-training (LSAP) to improve the generalization and data efficiency of text classification systems. LSAP incorporates label semantics into pre-trained generative models (T5 in our case) by performing secondary pre-training on labeled sentences from a variety of domains.
arXiv Detail & Related papers (2022-04-14T17:33:34Z)
Improving Probabilistic Models in Text Classification via Active Learning [0.0]
We propose a fast new model for text classification that combines information from both labeled and unlabeled data with an active learning component. We show that by introducing information about the structure of unlabeled data and iteratively labeling uncertain documents, our model improves performance.
arXiv Detail & Related papers (2022-02-05T20:09:26Z)
Out-of-Category Document Identification Using Target-Category Names as Weak Supervision [64.671654559798]
Out-of-category detection aims to distinguish documents according to their semantic relevance to the inlier (or target) categories. We present an out-of-category detection framework, which effectively measures how confidently each document belongs to one of the target categories.
arXiv Detail & Related papers (2021-11-24T21:01:25Z)
MotifClass: Weakly Supervised Text Classification with Higher-order Metadata Information [47.44278057062421]
We study the problem of weakly supervised text classification, which aims to classify text documents into a set of pre-defined categories with category surface names only. To be specific, we model the relationships between documents and metadata via a heterogeneous information network. We propose a novel framework, named MotifClass, which selects category-indicative motif instances, retrieves and generates pseudo-labeled training samples based on category names and indicative motif instances.
arXiv Detail & Related papers (2021-11-07T07:39:10Z)
DocSCAN: Unsupervised Text Classification via Learning from Neighbors [2.2082422928825145]
We introduce DocSCAN, a completely unsupervised text classification approach using Semantic Clustering by Adopting Nearest-Neighbors (SCAN) For each document, we obtain semantically informative vectors from a large pre-trained language model. Similar documents have proximate vectors, so neighbors in the representation space tend to share topic labels. Our learnable clustering approach uses pairs of neighboring datapoints as a weak learning signal. The proposed approach learns to assign classes to the whole dataset without provided ground-truth labels.
arXiv Detail & Related papers (2021-05-09T21:20:31Z)

This list is automatically generated from the titles and abstracts of the papers in this site.