FastClass: A Time-Efficient Approach to Weakly-Supervised Text
Classification
- URL: http://arxiv.org/abs/2212.05506v2
- Date: Thu, 15 Dec 2022 01:07:43 GMT
- Title: FastClass: A Time-Efficient Approach to Weakly-Supervised Text
Classification
- Authors: Tingyu Xia, Yue Wang, Yuan Tian, Yi Chang
- Abstract summary: This paper proposes FastClass, an efficient weakly-supervised classification approach.
It uses dense text representation to retrieve class-relevant documents from external unlabeled corpus.
Experiments show that the proposed approach frequently outperforms keyword-driven models in terms of classification accuracy and often enjoys orders-of-magnitude faster training speed.
- Score: 14.918600168973564
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Weakly-supervised text classification aims to train a classifier using only
class descriptions and unlabeled data. Recent research shows that
keyword-driven methods can achieve state-of-the-art performance on various
tasks. However, these methods not only rely on carefully-crafted class
descriptions to obtain class-specific keywords but also require substantial
amount of unlabeled data and takes a long time to train. This paper proposes
FastClass, an efficient weakly-supervised classification approach. It uses
dense text representation to retrieve class-relevant documents from external
unlabeled corpus and selects an optimal subset to train a classifier. Compared
to keyword-driven methods, our approach is less reliant on initial class
descriptions as it no longer needs to expand each class description into a set
of class-specific keywords. Experiments on a wide range of classification tasks
show that the proposed approach frequently outperforms keyword-driven models in
terms of classification accuracy and often enjoys orders-of-magnitude faster
training speed.
Related papers
- Label-Guided Prompt for Multi-label Few-shot Aspect Category Detection [12.094529796168384]
The representation of sentences and categories is a key issue in this task.
We propose a label-guided prompt method to represent sentences and categories.
Our method outperforms current state-of-the-art methods with a 3.86% - 4.75% improvement in the Macro-F1 score.
arXiv Detail & Related papers (2024-07-30T09:11:17Z) - XAI-CLASS: Explanation-Enhanced Text Classification with Extremely Weak
Supervision [6.406111099707549]
XAI-CLASS is a novel explanation-enhanced weakly-supervised text classification method.
It incorporates word saliency prediction as an auxiliary task.
XAI-CLASS outperforms other weakly-supervised text classification methods significantly.
arXiv Detail & Related papers (2023-10-31T23:24:22Z) - Mitigating Word Bias in Zero-shot Prompt-based Classifiers [55.60306377044225]
We show that matching class priors correlates strongly with the oracle upper bound performance.
We also demonstrate large consistent performance gains for prompt settings over a range of NLP tasks.
arXiv Detail & Related papers (2023-09-10T10:57:41Z) - PIEClass: Weakly-Supervised Text Classification with Prompting and
Noise-Robust Iterative Ensemble Training [42.013879670590214]
Weakly-supervised text classification trains a classifier using the label name of each target class as the only supervision.
We propose a new method, PIEClass, consisting of two modules.
PIEClass achieves overall better performance than existing strong baselines on seven benchmark datasets.
arXiv Detail & Related papers (2023-05-23T06:19:14Z) - CCPrefix: Counterfactual Contrastive Prefix-Tuning for Many-Class
Classification [57.62886091828512]
We propose a brand-new prefix-tuning method, Counterfactual Contrastive Prefix-tuning (CCPrefix) for many-class classification.
Basically, an instance-dependent soft prefix, derived from fact-counterfactual pairs in the label space, is leveraged to complement the language verbalizers in many-class classification.
arXiv Detail & Related papers (2022-11-11T03:45:59Z) - Out-of-Category Document Identification Using Target-Category Names as
Weak Supervision [64.671654559798]
Out-of-category detection aims to distinguish documents according to their semantic relevance to the inlier (or target) categories.
We present an out-of-category detection framework, which effectively measures how confidently each document belongs to one of the target categories.
arXiv Detail & Related papers (2021-11-24T21:01:25Z) - LeQua@CLEF2022: Learning to Quantify [76.22817970624875]
LeQua 2022 is a new lab for the evaluation of methods for learning to quantify'' in textual datasets.
The goal of this lab is to provide a setting for the comparative evaluation of methods for learning to quantify, both in the binary setting and in the single-label multiclass setting.
arXiv Detail & Related papers (2021-11-22T14:54:20Z) - TF-CR: Weighting Embeddings for Text Classification [6.531659195805749]
We introduce a novel weighting scheme, Term Frequency-Category Ratio (TF-CR), which can weight high-frequency, category-exclusive words higher when computing word embeddings.
Experiments on 16 classification datasets show the effectiveness of TF-CR, leading to improved performance scores over existing weighting schemes.
arXiv Detail & Related papers (2020-12-11T19:23:28Z) - Unsupervised Label Refinement Improves Dataless Text Classification [48.031421660674745]
Dataless text classification is capable of classifying documents into previously unseen labels by assigning a score to any document paired with a label description.
While promising, it crucially relies on accurate descriptions of the label set for each downstream task.
This reliance causes dataless classifiers to be highly sensitive to the choice of label descriptions and hinders the broader application of dataless classification in practice.
arXiv Detail & Related papers (2020-12-08T03:37:50Z) - X-Class: Text Classification with Extremely Weak Supervision [39.25777650619999]
In this paper, we explore text classification with extremely weak supervision.
We propose a novel framework X-Class to realize the adaptive representations.
X-Class can rival and even outperform seed-driven weakly supervised methods on 7 benchmark datasets.
arXiv Detail & Related papers (2020-10-24T06:09:51Z) - Text Classification Using Label Names Only: A Language Model
Self-Training Approach [80.63885282358204]
Current text classification methods typically require a good number of human-labeled documents as training data.
We show that our model achieves around 90% accuracy on four benchmark datasets including topic and sentiment classification.
arXiv Detail & Related papers (2020-10-14T17:06:41Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.