X-Class: Text Classification with Extremely Weak Supervision
- URL: http://arxiv.org/abs/2010.12794v2
- Date: Mon, 7 Feb 2022 23:16:14 GMT
- Title: X-Class: Text Classification with Extremely Weak Supervision
- Authors: Zihan Wang and Dheeraj Mekala and Jingbo Shang
- Abstract summary: In this paper, we explore text classification with extremely weak supervision.
We propose a novel framework X-Class to realize the adaptive representations.
X-Class can rival and even outperform seed-driven weakly supervised methods on 7 benchmark datasets.
- Score: 39.25777650619999
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: In this paper, we explore text classification with extremely weak
supervision, i.e., only relying on the surface text of class names. This is a
more challenging setting than the seed-driven weak supervision, which allows a
few seed words per class. We opt to attack this problem from a representation
learning perspective -- ideal document representations should lead to nearly
the same results between clustering and the desired classification. In
particular, one can classify the same corpus differently (e.g., based on topics
and locations), so document representations should be adaptive to the given
class names. We propose a novel framework X-Class to realize the adaptive
representations. Specifically, we first estimate class representations by
incrementally adding the most similar word to each class until inconsistency
arises. Following a tailored mixture of class attention mechanisms, we obtain
the document representation via a weighted average of contextualized word
representations. With the prior of each document assigned to its nearest class,
we then cluster and align the documents to classes. Finally, we pick the most
confident documents from each cluster to train a text classifier. Extensive
experiments demonstrate that X-Class can rival and even outperform seed-driven
weakly supervised methods on 7 benchmark datasets. Our dataset and code are
released at https://github.com/ZihanWangKi/XClass/ .
Related papers
- Classification Done Right for Vision-Language Pre-Training [66.90286715149786]
We introduce SuperClass, a super simple classification method for vision-language pre-training on image-text data.
SuperClass directly utilizes tokenized raw text as supervised classification labels, without the need for additional text filtering or selection.
SuperClass demonstrated superior performance on various downstream tasks, including classic computer vision benchmarks and vision language downstream tasks.
arXiv Detail & Related papers (2024-11-05T18:58:15Z) - XAI-CLASS: Explanation-Enhanced Text Classification with Extremely Weak
Supervision [6.406111099707549]
XAI-CLASS is a novel explanation-enhanced weakly-supervised text classification method.
It incorporates word saliency prediction as an auxiliary task.
XAI-CLASS outperforms other weakly-supervised text classification methods significantly.
arXiv Detail & Related papers (2023-10-31T23:24:22Z) - Mitigating Word Bias in Zero-shot Prompt-based Classifiers [55.60306377044225]
We show that matching class priors correlates strongly with the oracle upper bound performance.
We also demonstrate large consistent performance gains for prompt settings over a range of NLP tasks.
arXiv Detail & Related papers (2023-09-10T10:57:41Z) - MEGClass: Extremely Weakly Supervised Text Classification via
Mutually-Enhancing Text Granularities [33.567613041147844]
MEGClass is an extremely weakly-supervised text classification method.
It exploits Mutually-Enhancing Text Granularities.
It can select the most informative class-indicative documents.
arXiv Detail & Related papers (2023-04-04T17:26:11Z) - FastClass: A Time-Efficient Approach to Weakly-Supervised Text
Classification [14.918600168973564]
This paper proposes FastClass, an efficient weakly-supervised classification approach.
It uses dense text representation to retrieve class-relevant documents from external unlabeled corpus.
Experiments show that the proposed approach frequently outperforms keyword-driven models in terms of classification accuracy and often enjoys orders-of-magnitude faster training speed.
arXiv Detail & Related papers (2022-12-11T13:43:22Z) - Out-of-Category Document Identification Using Target-Category Names as
Weak Supervision [64.671654559798]
Out-of-category detection aims to distinguish documents according to their semantic relevance to the inlier (or target) categories.
We present an out-of-category detection framework, which effectively measures how confidently each document belongs to one of the target categories.
arXiv Detail & Related papers (2021-11-24T21:01:25Z) - Generalized Funnelling: Ensemble Learning and Heterogeneous Document
Embeddings for Cross-Lingual Text Classification [78.83284164605473]
emphFunnelling (Fun) is a recently proposed method for cross-lingual text classification.
We describe emphGeneralized Funnelling (gFun) as a generalization of Fun.
We show that gFun substantially improves over Fun and over state-of-the-art baselines.
arXiv Detail & Related papers (2021-09-17T23:33:04Z) - DocSCAN: Unsupervised Text Classification via Learning from Neighbors [2.2082422928825145]
We introduce DocSCAN, a completely unsupervised text classification approach using Semantic Clustering by Adopting Nearest-Neighbors (SCAN)
For each document, we obtain semantically informative vectors from a large pre-trained language model. Similar documents have proximate vectors, so neighbors in the representation space tend to share topic labels.
Our learnable clustering approach uses pairs of neighboring datapoints as a weak learning signal. The proposed approach learns to assign classes to the whole dataset without provided ground-truth labels.
arXiv Detail & Related papers (2021-05-09T21:20:31Z) - Learning and Evaluating Representations for Deep One-class
Classification [59.095144932794646]
We present a two-stage framework for deep one-class classification.
We first learn self-supervised representations from one-class data, and then build one-class classifiers on learned representations.
In experiments, we demonstrate state-of-the-art performance on visual domain one-class classification benchmarks.
arXiv Detail & Related papers (2020-11-04T23:33:41Z) - Classification and Clustering of arXiv Documents, Sections, and
Abstracts, Comparing Encodings of Natural and Mathematical Language [8.522576207528017]
We show how selecting and combining encodings of natural and mathematical language affect classification and clustering of documents with mathematical content.
Our encodings achieve classification accuracies up to $82.8%$ and cluster purities up to $69.4%$.
We show that the computer outperforms a human expert when classifying documents.
arXiv Detail & Related papers (2020-05-22T06:16:32Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.