LIME: Weakly-Supervised Text Classification Without Seeds
- URL: http://arxiv.org/abs/2210.06720v1
- Date: Thu, 13 Oct 2022 04:28:28 GMT
- Title: LIME: Weakly-Supervised Text Classification Without Seeds
- Authors: Seongmin Park, Jihwa Lee
- Abstract summary: In weakly-supervised text classification, only label names act as sources of supervision.
We present LIME, a framework for weakly-supervised text classification.
We find that combining weakly-supervised classification and textual entailment mitigates shortcomings of both.
- Score: 1.2691047660244335
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: In weakly-supervised text classification, only label names act as sources of
supervision. Predominant approaches to weakly-supervised text classification
utilize a two-phase framework, where test samples are first assigned
pseudo-labels and are then used to train a neural text classifier. In most
previous work, the pseudo-labeling step is dependent on obtaining seed words
that best capture the relevance of each class label. We present LIME, a
framework for weakly-supervised text classification that entirely replaces the
brittle seed-word generation process with entailment-based
pseudo-classification. We find that combining weakly-supervised classification
and textual entailment mitigates shortcomings of both, resulting in a more
streamlined and effective classification pipeline. With just an off-the-shelf
textual entailment model, LIME outperforms recent baselines in
weakly-supervised text classification and achieves state-of-the-art in 4
benchmarks. We open source our code at https://github.com/seongminp/LIME.
Related papers
- TELEClass: Taxonomy Enrichment and LLM-Enhanced Hierarchical Text Classification with Minimal Supervision [41.05874642535256]
Hierarchical text classification aims to categorize each document into a set of classes in a label taxonomy.
Most earlier works focus on fully or semi-supervised methods that require a large amount of human annotated data.
We work on hierarchical text classification with the minimal amount of supervision: using the sole class name of each node as the only supervision.
arXiv Detail & Related papers (2024-02-29T22:26:07Z) - XAI-CLASS: Explanation-Enhanced Text Classification with Extremely Weak
Supervision [6.406111099707549]
XAI-CLASS is a novel explanation-enhanced weakly-supervised text classification method.
It incorporates word saliency prediction as an auxiliary task.
XAI-CLASS outperforms other weakly-supervised text classification methods significantly.
arXiv Detail & Related papers (2023-10-31T23:24:22Z) - PIEClass: Weakly-Supervised Text Classification with Prompting and
Noise-Robust Iterative Ensemble Training [42.013879670590214]
Weakly-supervised text classification trains a classifier using the label name of each target class as the only supervision.
We propose a new method, PIEClass, consisting of two modules.
PIEClass achieves overall better performance than existing strong baselines on seven benchmark datasets.
arXiv Detail & Related papers (2023-05-23T06:19:14Z) - WOT-Class: Weakly Supervised Open-world Text Classification [41.77945049159303]
We work on a novel problem of weakly supervised open-world text classification.
We propose a novel framework WOT-Class that lifts strong assumptions.
Experiments on 7 popular text classification datasets demonstrate that WOT-Class outperforms strong baselines.
arXiv Detail & Related papers (2023-05-21T08:51:24Z) - MEGClass: Extremely Weakly Supervised Text Classification via
Mutually-Enhancing Text Granularities [33.567613041147844]
MEGClass is an extremely weakly-supervised text classification method.
It exploits Mutually-Enhancing Text Granularities.
It can select the most informative class-indicative documents.
arXiv Detail & Related papers (2023-04-04T17:26:11Z) - Like a Good Nearest Neighbor: Practical Content Moderation and Text
Classification [66.02091763340094]
Like a Good Nearest Neighbor (LaGoNN) is a modification to SetFit that introduces no learnable parameters but alters input text with information from its nearest neighbor.
LaGoNN is effective at flagging undesirable content and text classification, and improves the performance of SetFit.
arXiv Detail & Related papers (2023-02-17T15:43:29Z) - Many-Class Text Classification with Matching [65.74328417321738]
We formulate textbfText textbfClassification as a textbfMatching problem between the text and the labels, and propose a simple yet effective framework named TCM.
Compared with previous text classification approaches, TCM takes advantage of the fine-grained semantic information of the classification labels.
arXiv Detail & Related papers (2022-05-23T15:51:19Z) - Label Semantic Aware Pre-training for Few-shot Text Classification [53.80908620663974]
We propose Label Semantic Aware Pre-training (LSAP) to improve the generalization and data efficiency of text classification systems.
LSAP incorporates label semantics into pre-trained generative models (T5 in our case) by performing secondary pre-training on labeled sentences from a variety of domains.
arXiv Detail & Related papers (2022-04-14T17:33:34Z) - Hierarchical Heterogeneous Graph Representation Learning for Short Text
Classification [60.233529926965836]
We propose a new method called SHINE, which is based on graph neural network (GNN) for short text classification.
First, we model the short text dataset as a hierarchical heterogeneous graph consisting of word-level component graphs.
Then, we dynamically learn a short document graph that facilitates effective label propagation among similar short texts.
arXiv Detail & Related papers (2021-10-30T05:33:05Z) - MASKER: Masked Keyword Regularization for Reliable Text Classification [73.90326322794803]
We propose a fine-tuning method, coined masked keyword regularization (MASKER), that facilitates context-based prediction.
MASKER regularizes the model to reconstruct the keywords from the rest of the words and make low-confidence predictions without enough context.
We demonstrate that MASKER improves OOD detection and cross-domain generalization without degrading classification accuracy.
arXiv Detail & Related papers (2020-12-17T04:54:16Z) - Cooperative Bi-path Metric for Few-shot Learning [50.98891758059389]
We make two contributions to investigate the few-shot classification problem.
We report a simple and effective baseline trained on base classes in the way of traditional supervised learning.
We propose a cooperative bi-path metric for classification, which leverages the correlations between base classes and novel classes to further improve the accuracy.
arXiv Detail & Related papers (2020-08-10T11:28:52Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.