XAI-CLASS: Explanation-Enhanced Text Classification with Extremely Weak
Supervision
- URL: http://arxiv.org/abs/2311.00189v1
- Date: Tue, 31 Oct 2023 23:24:22 GMT
- Title: XAI-CLASS: Explanation-Enhanced Text Classification with Extremely Weak
Supervision
- Authors: Daniel Hajialigol, Hanwen Liu, Xuan Wang
- Abstract summary: XAI-CLASS is a novel explanation-enhanced weakly-supervised text classification method.
It incorporates word saliency prediction as an auxiliary task.
XAI-CLASS outperforms other weakly-supervised text classification methods significantly.
- Score: 6.406111099707549
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Text classification aims to effectively categorize documents into pre-defined
categories. Traditional methods for text classification often rely on large
amounts of manually annotated training data, making the process time-consuming
and labor-intensive. To address this issue, recent studies have focused on
weakly-supervised and extremely weakly-supervised settings, which require
minimal or no human annotation, respectively. In previous methods of weakly
supervised text classification, pseudo-training data is generated by assigning
pseudo-labels to documents based on their alignment (e.g., keyword matching)
with specific classes. However, these methods ignore the importance of
incorporating the explanations of the generated pseudo-labels, or saliency of
individual words, as additional guidance during the text classification
training process. To address this limitation, we propose XAI-CLASS, a novel
explanation-enhanced extremely weakly-supervised text classification method
that incorporates word saliency prediction as an auxiliary task. XAI-CLASS
begins by employing a multi-round question-answering process to generate
pseudo-training data that promotes the mutual enhancement of class labels and
corresponding explanation word generation. This pseudo-training data is then
used to train a multi-task framework that simultaneously learns both text
classification and word saliency prediction. Extensive experiments on several
weakly-supervised text classification datasets show that XAI-CLASS outperforms
other weakly-supervised text classification methods significantly. Moreover,
experiments demonstrate that XAI-CLASS enhances both model performance and
explainability.
Related papers
- Co-training for Low Resource Scientific Natural Language Inference [65.37685198688538]
We propose a novel co-training method that assigns weights based on the training dynamics of the classifiers to the distantly supervised labels.
By assigning importance weights instead of filtering out examples based on an arbitrary threshold on the predicted confidence, we maximize the usage of automatically labeled data.
The proposed method obtains an improvement of 1.5% in Macro F1 over the distant supervision baseline, and substantial improvements over several other strong SSL baselines.
arXiv Detail & Related papers (2024-06-20T18:35:47Z) - MEGClass: Extremely Weakly Supervised Text Classification via
Mutually-Enhancing Text Granularities [33.567613041147844]
MEGClass is an extremely weakly-supervised text classification method.
It exploits Mutually-Enhancing Text Granularities.
It can select the most informative class-indicative documents.
arXiv Detail & Related papers (2023-04-04T17:26:11Z) - FastClass: A Time-Efficient Approach to Weakly-Supervised Text
Classification [14.918600168973564]
This paper proposes FastClass, an efficient weakly-supervised classification approach.
It uses dense text representation to retrieve class-relevant documents from external unlabeled corpus.
Experiments show that the proposed approach frequently outperforms keyword-driven models in terms of classification accuracy and often enjoys orders-of-magnitude faster training speed.
arXiv Detail & Related papers (2022-12-11T13:43:22Z) - CCPrefix: Counterfactual Contrastive Prefix-Tuning for Many-Class
Classification [57.62886091828512]
We propose a brand-new prefix-tuning method, Counterfactual Contrastive Prefix-tuning (CCPrefix) for many-class classification.
Basically, an instance-dependent soft prefix, derived from fact-counterfactual pairs in the label space, is leveraged to complement the language verbalizers in many-class classification.
arXiv Detail & Related papers (2022-11-11T03:45:59Z) - LIME: Weakly-Supervised Text Classification Without Seeds [1.2691047660244335]
In weakly-supervised text classification, only label names act as sources of supervision.
We present LIME, a framework for weakly-supervised text classification.
We find that combining weakly-supervised classification and textual entailment mitigates shortcomings of both.
arXiv Detail & Related papers (2022-10-13T04:28:28Z) - Label Semantic Aware Pre-training for Few-shot Text Classification [53.80908620663974]
We propose Label Semantic Aware Pre-training (LSAP) to improve the generalization and data efficiency of text classification systems.
LSAP incorporates label semantics into pre-trained generative models (T5 in our case) by performing secondary pre-training on labeled sentences from a variety of domains.
arXiv Detail & Related papers (2022-04-14T17:33:34Z) - Binary Classification from Multiple Unlabeled Datasets via Surrogate Set
Classification [94.55805516167369]
We propose a new approach for binary classification from m U-sets for $mge2$.
Our key idea is to consider an auxiliary classification task called surrogate set classification (SSC)
arXiv Detail & Related papers (2021-02-01T07:36:38Z) - TF-CR: Weighting Embeddings for Text Classification [6.531659195805749]
We introduce a novel weighting scheme, Term Frequency-Category Ratio (TF-CR), which can weight high-frequency, category-exclusive words higher when computing word embeddings.
Experiments on 16 classification datasets show the effectiveness of TF-CR, leading to improved performance scores over existing weighting schemes.
arXiv Detail & Related papers (2020-12-11T19:23:28Z) - Text Classification with Few Examples using Controlled Generalization [58.971750512415134]
Current practice relies on pre-trained word embeddings to map words unseen in training to similar seen ones.
Our alternative begins with sparse pre-trained representations derived from unlabeled parsed corpora.
We show that a feed-forward network over these vectors is especially effective in low-data scenarios.
arXiv Detail & Related papers (2020-05-18T06:04:58Z) - Learning Interpretable and Discrete Representations with Adversarial
Training for Unsupervised Text Classification [87.28408260725138]
TIGAN learns to encode texts into two disentangled representations, including a discrete code and a continuous noise.
The extracted topical words for representing latent topics show that TIGAN learns coherent and highly interpretable topics.
arXiv Detail & Related papers (2020-04-28T02:53:59Z) - Description Based Text Classification with Reinforcement Learning [34.18824470728299]
We propose a new framework for text classification, in which each category label is associated with a category description.
We observe significant performance boosts over strong baselines on a wide range of text classification tasks.
arXiv Detail & Related papers (2020-02-08T02:14:28Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.