PIEClass: Weakly-Supervised Text Classification with Prompting and
Noise-Robust Iterative Ensemble Training
- URL: http://arxiv.org/abs/2305.13723v2
- Date: Fri, 20 Oct 2023 15:14:34 GMT
- Title: PIEClass: Weakly-Supervised Text Classification with Prompting and
Noise-Robust Iterative Ensemble Training
- Authors: Yunyi Zhang, Minhao Jiang, Yu Meng, Yu Zhang, Jiawei Han
- Abstract summary: Weakly-supervised text classification trains a classifier using the label name of each target class as the only supervision.
We propose a new method, PIEClass, consisting of two modules.
PIEClass achieves overall better performance than existing strong baselines on seven benchmark datasets.
- Score: 42.013879670590214
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Weakly-supervised text classification trains a classifier using the label
name of each target class as the only supervision, which largely reduces human
annotation efforts. Most existing methods first use the label names as static
keyword-based features to generate pseudo labels, which are then used for final
classifier training. While reasonable, such a commonly adopted framework
suffers from two limitations: (1) keywords can have different meanings in
different contexts and some text may not have any keyword, so keyword matching
can induce noisy and inadequate pseudo labels; (2) the errors made in the
pseudo label generation stage will directly propagate to the classifier
training stage without a chance of being corrected. In this paper, we propose a
new method, PIEClass, consisting of two modules: (1) a pseudo label acquisition
module that uses zero-shot prompting of pre-trained language models (PLM) to
get pseudo labels based on contextualized text understanding beyond static
keyword matching, and (2) a noise-robust iterative ensemble training module
that iteratively trains classifiers and updates pseudo labels by utilizing two
PLM fine-tuning methods that regularize each other. Extensive experiments show
that PIEClass achieves overall better performance than existing strong
baselines on seven benchmark datasets and even achieves similar performance to
fully-supervised classifiers on sentiment classification tasks.
Related papers
- Determined Multi-Label Learning via Similarity-Based Prompt [12.428779617221366]
In multi-label classification, each training instance is associated with multiple class labels simultaneously.
To alleviate this problem, a novel labeling setting termed textitDetermined Multi-Label Learning (DMLL) is proposed.
arXiv Detail & Related papers (2024-03-25T07:08:01Z) - RulePrompt: Weakly Supervised Text Classification with Prompting PLMs and Self-Iterative Logical Rules [30.239044569301534]
Weakly supervised text classification (WSTC) has attracted increasing attention due to its applicability in classifying a mass of texts.
We propose a prompting PLM-based approach named RulePrompt for the WSTC task, consisting of a rule mining module and a rule-enhanced pseudo label generation module.
Our approach yields interpretable category rules, proving its advantage in disambiguating easily-confused categories.
arXiv Detail & Related papers (2024-03-05T12:50:36Z) - Semantic Connectivity-Driven Pseudo-labeling for Cross-domain
Segmentation [89.41179071022121]
Self-training is a prevailing approach in cross-domain semantic segmentation.
We propose a novel approach called Semantic Connectivity-driven pseudo-labeling.
This approach formulates pseudo-labels at the connectivity level and thus can facilitate learning structured and low-noise semantics.
arXiv Detail & Related papers (2023-12-11T12:29:51Z) - Substituting Data Annotation with Balanced Updates and Collective Loss
in Multi-label Text Classification [19.592985329023733]
Multi-label text classification (MLTC) is the task of assigning multiple labels to a given text.
We study the MLTC problem in annotation-free and scarce-annotation settings in which the magnitude of available supervision signals is linear to the number of labels.
Our method follows three steps, (1) mapping input text into a set of preliminary label likelihoods by natural language inference using a pre-trained language model, (2) calculating a signed label dependency graph by label descriptions, and (3) updating the preliminary label likelihoods with message passing along the label dependency graph.
arXiv Detail & Related papers (2023-09-24T04:12:52Z) - Description-Enhanced Label Embedding Contrastive Learning for Text
Classification [65.01077813330559]
Self-Supervised Learning (SSL) in model learning process and design a novel self-supervised Relation of Relation (R2) classification task.
Relation of Relation Learning Network (R2-Net) for text classification, in which text classification and R2 classification are treated as optimization targets.
external knowledge from WordNet to obtain multi-aspect descriptions for label semantic learning.
arXiv Detail & Related papers (2023-06-15T02:19:34Z) - Improving Self-training for Cross-lingual Named Entity Recognition with
Contrastive and Prototype Learning [80.08139343603956]
In cross-lingual named entity recognition, self-training is commonly used to bridge the linguistic gap.
In this work, we aim to improve self-training for cross-lingual NER by combining representation learning and pseudo label refinement.
Our proposed method, namely ContProto mainly comprises two components: (1) contrastive self-training and (2) prototype-based pseudo-labeling.
arXiv Detail & Related papers (2023-05-23T02:52:16Z) - FastClass: A Time-Efficient Approach to Weakly-Supervised Text
Classification [14.918600168973564]
This paper proposes FastClass, an efficient weakly-supervised classification approach.
It uses dense text representation to retrieve class-relevant documents from external unlabeled corpus.
Experiments show that the proposed approach frequently outperforms keyword-driven models in terms of classification accuracy and often enjoys orders-of-magnitude faster training speed.
arXiv Detail & Related papers (2022-12-11T13:43:22Z) - LIME: Weakly-Supervised Text Classification Without Seeds [1.2691047660244335]
In weakly-supervised text classification, only label names act as sources of supervision.
We present LIME, a framework for weakly-supervised text classification.
We find that combining weakly-supervised classification and textual entailment mitigates shortcomings of both.
arXiv Detail & Related papers (2022-10-13T04:28:28Z) - Transductive CLIP with Class-Conditional Contrastive Learning [68.51078382124331]
We propose Transductive CLIP, a novel framework for learning a classification network with noisy labels from scratch.
A class-conditional contrastive learning mechanism is proposed to mitigate the reliance on pseudo labels.
ensemble labels is adopted as a pseudo label updating strategy to stabilize the training of deep neural networks with noisy labels.
arXiv Detail & Related papers (2022-06-13T14:04:57Z) - Label Semantic Aware Pre-training for Few-shot Text Classification [53.80908620663974]
We propose Label Semantic Aware Pre-training (LSAP) to improve the generalization and data efficiency of text classification systems.
LSAP incorporates label semantics into pre-trained generative models (T5 in our case) by performing secondary pre-training on labeled sentences from a variety of domains.
arXiv Detail & Related papers (2022-04-14T17:33:34Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.