MEGClass: Extremely Weakly Supervised Text Classification via
Mutually-Enhancing Text Granularities
- URL: http://arxiv.org/abs/2304.01969v2
- Date: Sun, 29 Oct 2023 21:03:54 GMT
- Title: MEGClass: Extremely Weakly Supervised Text Classification via
Mutually-Enhancing Text Granularities
- Authors: Priyanka Kargupta, Tanay Komarlu, Susik Yoon, Xuan Wang, Jiawei Han
- Abstract summary: MEGClass is an extremely weakly-supervised text classification method.
It exploits Mutually-Enhancing Text Granularities.
It can select the most informative class-indicative documents.
- Score: 33.567613041147844
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Text classification is essential for organizing unstructured text.
Traditional methods rely on human annotations or, more recently, a set of class
seed words for supervision, which can be costly, particularly for specialized
or emerging domains. To address this, using class surface names alone as
extremely weak supervision has been proposed. However, existing approaches
treat different levels of text granularity (documents, sentences, or words)
independently, disregarding inter-granularity class disagreements and the
context identifiable exclusively through joint extraction. In order to tackle
these issues, we introduce MEGClass, an extremely weakly-supervised text
classification method that leverages Mutually-Enhancing Text Granularities.
MEGClass utilizes coarse- and fine-grained context signals obtained by jointly
considering a document's most class-indicative words and sentences. This
approach enables the learning of a contextualized document representation that
captures the most discriminative class indicators. By preserving the
heterogeneity of potential classes, MEGClass can select the most informative
class-indicative documents as iterative feedback to enhance the initial
word-based class representations and ultimately fine-tune a pre-trained text
classifier. Extensive experiments on seven benchmark datasets demonstrate that
MEGClass outperforms other weakly and extremely weakly supervised methods.
Related papers
- Unlocking the Multi-modal Potential of CLIP for Generalized Category Discovery [50.564146730579424]
We propose a Text Embedding Synthesizer (TES) to generate pseudo text embeddings for unlabelled samples.
Our method unlocks the multi-modal potentials of CLIP and outperforms the baseline methods by a large margin on all GCD benchmarks.
arXiv Detail & Related papers (2024-03-15T02:40:13Z) - TELEClass: Taxonomy Enrichment and LLM-Enhanced Hierarchical Text Classification with Minimal Supervision [41.05874642535256]
Hierarchical text classification aims to categorize each document into a set of classes in a label taxonomy.
Most earlier works focus on fully or semi-supervised methods that require a large amount of human annotated data.
We work on hierarchical text classification with the minimal amount of supervision: using the sole class name of each node as the only supervision.
arXiv Detail & Related papers (2024-02-29T22:26:07Z) - XAI-CLASS: Explanation-Enhanced Text Classification with Extremely Weak
Supervision [6.406111099707549]
XAI-CLASS is a novel explanation-enhanced weakly-supervised text classification method.
It incorporates word saliency prediction as an auxiliary task.
XAI-CLASS outperforms other weakly-supervised text classification methods significantly.
arXiv Detail & Related papers (2023-10-31T23:24:22Z) - WOT-Class: Weakly Supervised Open-world Text Classification [41.77945049159303]
We work on a novel problem of weakly supervised open-world text classification.
We propose a novel framework WOT-Class that lifts strong assumptions.
Experiments on 7 popular text classification datasets demonstrate that WOT-Class outperforms strong baselines.
arXiv Detail & Related papers (2023-05-21T08:51:24Z) - LIME: Weakly-Supervised Text Classification Without Seeds [1.2691047660244335]
In weakly-supervised text classification, only label names act as sources of supervision.
We present LIME, a framework for weakly-supervised text classification.
We find that combining weakly-supervised classification and textual entailment mitigates shortcomings of both.
arXiv Detail & Related papers (2022-10-13T04:28:28Z) - Many-Class Text Classification with Matching [65.74328417321738]
We formulate textbfText textbfClassification as a textbfMatching problem between the text and the labels, and propose a simple yet effective framework named TCM.
Compared with previous text classification approaches, TCM takes advantage of the fine-grained semantic information of the classification labels.
arXiv Detail & Related papers (2022-05-23T15:51:19Z) - MotifClass: Weakly Supervised Text Classification with Higher-order
Metadata Information [47.44278057062421]
We study the problem of weakly supervised text classification, which aims to classify text documents into a set of pre-defined categories with category surface names only.
To be specific, we model the relationships between documents and metadata via a heterogeneous information network.
We propose a novel framework, named MotifClass, which selects category-indicative motif instances, retrieves and generates pseudo-labeled training samples based on category names and indicative motif instances.
arXiv Detail & Related papers (2021-11-07T07:39:10Z) - Hierarchical Heterogeneous Graph Representation Learning for Short Text
Classification [60.233529926965836]
We propose a new method called SHINE, which is based on graph neural network (GNN) for short text classification.
First, we model the short text dataset as a hierarchical heterogeneous graph consisting of word-level component graphs.
Then, we dynamically learn a short document graph that facilitates effective label propagation among similar short texts.
arXiv Detail & Related papers (2021-10-30T05:33:05Z) - X-Class: Text Classification with Extremely Weak Supervision [39.25777650619999]
In this paper, we explore text classification with extremely weak supervision.
We propose a novel framework X-Class to realize the adaptive representations.
X-Class can rival and even outperform seed-driven weakly supervised methods on 7 benchmark datasets.
arXiv Detail & Related papers (2020-10-24T06:09:51Z) - Dynamic Semantic Matching and Aggregation Network for Few-shot Intent
Detection [69.2370349274216]
Few-shot Intent Detection is challenging due to the scarcity of available annotated utterances.
Semantic components are distilled from utterances via multi-head self-attention.
Our method provides a comprehensive matching measure to enhance representations of both labeled and unlabeled instances.
arXiv Detail & Related papers (2020-10-06T05:16:38Z) - Learning Interpretable and Discrete Representations with Adversarial
Training for Unsupervised Text Classification [87.28408260725138]
TIGAN learns to encode texts into two disentangled representations, including a discrete code and a continuous noise.
The extracted topical words for representing latent topics show that TIGAN learns coherent and highly interpretable topics.
arXiv Detail & Related papers (2020-04-28T02:53:59Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.