Related papers: LIME: Weakly-Supervised Text Classification Without Seeds

LIME: Weakly-Supervised Text Classification Without Seeds

URL: http://arxiv.org/abs/2210.06720v1
Date: Thu, 13 Oct 2022 04:28:28 GMT
Title: LIME: Weakly-Supervised Text Classification Without Seeds
Authors: Seongmin Park, Jihwa Lee
Abstract summary: In weakly-supervised text classification, only label names act as sources of supervision. We present LIME, a framework for weakly-supervised text classification. We find that combining weakly-supervised classification and textual entailment mitigates shortcomings of both.
Score: 1.2691047660244335
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: In weakly-supervised text classification, only label names act as sources of supervision. Predominant approaches to weakly-supervised text classification utilize a two-phase framework, where test samples are first assigned pseudo-labels and are then used to train a neural text classifier. In most previous work, the pseudo-labeling step is dependent on obtaining seed words that best capture the relevance of each class label. We present LIME, a framework for weakly-supervised text classification that entirely replaces the brittle seed-word generation process with entailment-based pseudo-classification. We find that combining weakly-supervised classification and textual entailment mitigates shortcomings of both, resulting in a more streamlined and effective classification pipeline. With just an off-the-shelf textual entailment model, LIME outperforms recent baselines in weakly-supervised text classification and achieves state-of-the-art in 4 benchmarks. We open source our code at https://github.com/seongminp/LIME.

Related papers

TELEClass: Taxonomy Enrichment and LLM-Enhanced Hierarchical Text Classification with Minimal Supervision [41.05874642535256]
Hierarchical text classification aims to categorize each document into a set of classes in a label taxonomy. Most earlier works focus on fully or semi-supervised methods that require a large amount of human annotated data. We work on hierarchical text classification with the minimal amount of supervision: using the sole class name of each node as the only supervision.
arXiv Detail & Related papers (2024-02-29T22:26:07Z)
XAI-CLASS: Explanation-Enhanced Text Classification with Extremely Weak Supervision [6.406111099707549]
XAI-CLASS is a novel explanation-enhanced weakly-supervised text classification method. It incorporates word saliency prediction as an auxiliary task. XAI-CLASS outperforms other weakly-supervised text classification methods significantly.
arXiv Detail & Related papers (2023-10-31T23:24:22Z)
PIEClass: Weakly-Supervised Text Classification with Prompting and Noise-Robust Iterative Ensemble Training [42.013879670590214]
Weakly-supervised text classification trains a classifier using the label name of each target class as the only supervision. We propose a new method, PIEClass, consisting of two modules. PIEClass achieves overall better performance than existing strong baselines on seven benchmark datasets.
arXiv Detail & Related papers (2023-05-23T06:19:14Z)
WOT-Class: Weakly Supervised Open-world Text Classification [41.77945049159303]
We work on a novel problem of weakly supervised open-world text classification. We propose a novel framework WOT-Class that lifts strong assumptions. Experiments on 7 popular text classification datasets demonstrate that WOT-Class outperforms strong baselines.
arXiv Detail & Related papers (2023-05-21T08:51:24Z)
MEGClass: Extremely Weakly Supervised Text Classification via Mutually-Enhancing Text Granularities [33.567613041147844]
MEGClass is an extremely weakly-supervised text classification method. It exploits Mutually-Enhancing Text Granularities. It can select the most informative class-indicative documents.
arXiv Detail & Related papers (2023-04-04T17:26:11Z)
Like a Good Nearest Neighbor: Practical Content Moderation and Text Classification [66.02091763340094]
Like a Good Nearest Neighbor (LaGoNN) is a modification to SetFit that introduces no learnable parameters but alters input text with information from its nearest neighbor. LaGoNN is effective at flagging undesirable content and text classification, and improves the performance of SetFit.
arXiv Detail & Related papers (2023-02-17T15:43:29Z)
Many-Class Text Classification with Matching [65.74328417321738]
We formulate textbfText textbfClassification as a textbfMatching problem between the text and the labels, and propose a simple yet effective framework named TCM. Compared with previous text classification approaches, TCM takes advantage of the fine-grained semantic information of the classification labels.
arXiv Detail & Related papers (2022-05-23T15:51:19Z)
Label Semantic Aware Pre-training for Few-shot Text Classification [53.80908620663974]
We propose Label Semantic Aware Pre-training (LSAP) to improve the generalization and data efficiency of text classification systems. LSAP incorporates label semantics into pre-trained generative models (T5 in our case) by performing secondary pre-training on labeled sentences from a variety of domains.
arXiv Detail & Related papers (2022-04-14T17:33:34Z)
Hierarchical Heterogeneous Graph Representation Learning for Short Text Classification [60.233529926965836]
We propose a new method called SHINE, which is based on graph neural network (GNN) for short text classification. First, we model the short text dataset as a hierarchical heterogeneous graph consisting of word-level component graphs. Then, we dynamically learn a short document graph that facilitates effective label propagation among similar short texts.
arXiv Detail & Related papers (2021-10-30T05:33:05Z)
Weakly-supervised Text Classification Based on Keyword Graph [30.57722085686241]
We propose a novel framework called ClassKG to explore keyword-keyword correlation on keyword graph by GNN. Our framework is an iterative process. In each iteration, we first construct a keyword graph, so the task of assigning pseudo labels is transformed to annotating keyword subgraphs. With the pseudo labels generated by the subgraph annotator, we then train a text classifier to classify the unlabeled texts.
arXiv Detail & Related papers (2021-10-06T08:58:02Z)
MASKER: Masked Keyword Regularization for Reliable Text Classification [73.90326322794803]
We propose a fine-tuning method, coined masked keyword regularization (MASKER), that facilitates context-based prediction. MASKER regularizes the model to reconstruct the keywords from the rest of the words and make low-confidence predictions without enough context. We demonstrate that MASKER improves OOD detection and cross-domain generalization without degrading classification accuracy.
arXiv Detail & Related papers (2020-12-17T04:54:16Z)
Cooperative Bi-path Metric for Few-shot Learning [50.98891758059389]
We make two contributions to investigate the few-shot classification problem. We report a simple and effective baseline trained on base classes in the way of traditional supervised learning. We propose a cooperative bi-path metric for classification, which leverages the correlations between base classes and novel classes to further improve the accuracy.
arXiv Detail & Related papers (2020-08-10T11:28:52Z)

This list is automatically generated from the titles and abstracts of the papers in this site.