JointMatch: A Unified Approach for Diverse and Collaborative
Pseudo-Labeling to Semi-Supervised Text Classification
- URL: http://arxiv.org/abs/2310.14583v1
- Date: Mon, 23 Oct 2023 05:43:35 GMT
- Title: JointMatch: A Unified Approach for Diverse and Collaborative
Pseudo-Labeling to Semi-Supervised Text Classification
- Authors: Henry Peng Zou, Cornelia Caragea
- Abstract summary: Semi-supervised text classification (SSTC) has gained increasing attention due to its ability to leverage unlabeled data.
Existing approaches based on pseudo-labeling suffer from the issues of pseudo-label bias and error accumulation.
We propose JointMatch, a holistic approach for SSTC that addresses these challenges by unifying ideas from recent semi-supervised learning.
- Score: 65.268245109828
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Semi-supervised text classification (SSTC) has gained increasing attention
due to its ability to leverage unlabeled data. However, existing approaches
based on pseudo-labeling suffer from the issues of pseudo-label bias and error
accumulation. In this paper, we propose JointMatch, a holistic approach for
SSTC that addresses these challenges by unifying ideas from recent
semi-supervised learning and the task of learning with noise. JointMatch
adaptively adjusts classwise thresholds based on the learning status of
different classes to mitigate model bias towards current easy classes.
Additionally, JointMatch alleviates error accumulation by utilizing two
differently initialized networks to teach each other in a cross-labeling
manner. To maintain divergence between the two networks for mutual learning, we
introduce a strategy that weighs more disagreement data while also allowing the
utilization of high-quality agreement data for training. Experimental results
on benchmark datasets demonstrate the superior performance of JointMatch,
achieving a significant 5.13% improvement on average. Notably, JointMatch
delivers impressive results even in the extremely-scarce-label setting,
obtaining 86% accuracy on AG News with only 5 labels per class. We make our
code available at https://github.com/HenryPengZou/JointMatch.
Related papers
- RankMatch: A Novel Approach to Semi-Supervised Label Distribution
Learning Leveraging Inter-label Correlations [52.549807652527306]
This paper introduces RankMatch, an innovative approach for Semi-Supervised Label Distribution Learning (SSLDL)
RankMatch effectively utilizes a small number of labeled examples in conjunction with a larger quantity of unlabeled data.
We establish a theoretical generalization bound for RankMatch, and through extensive experiments, demonstrate its superiority in performance against existing SSLDL methods.
arXiv Detail & Related papers (2023-12-11T12:47:29Z) - DualMatch: Robust Semi-Supervised Learning with Dual-Level Interaction [10.775623936099173]
Previous semi-supervised learning methods typically match model predictions of different data-augmented views in a single-level interaction manner.
We propose a novel SSL method called DualMatch, in which the class prediction jointly invokes feature embedding in a dual-level interaction manner.
In the standard SSL setting, the proposal achieves 9% error reduction compared with SOTA methods, even in a more challenging class-imbalanced setting, the proposal can still achieve 6% error reduction.
arXiv Detail & Related papers (2023-10-25T08:34:05Z) - SoftMatch: Addressing the Quantity-Quality Trade-off in Semi-supervised
Learning [101.86916775218403]
This paper revisits the popular pseudo-labeling methods via a unified sample weighting formulation.
We propose SoftMatch to overcome the trade-off by maintaining both high quantity and high quality of pseudo-labels during training.
In experiments, SoftMatch shows substantial improvements across a wide variety of benchmarks, including image, text, and imbalanced classification.
arXiv Detail & Related papers (2023-01-26T03:53:25Z) - Dense FixMatch: a simple semi-supervised learning method for pixel-wise
prediction tasks [68.36996813591425]
We propose Dense FixMatch, a simple method for online semi-supervised learning of dense and structured prediction tasks.
We enable the application of FixMatch in semi-supervised learning problems beyond image classification by adding a matching operation on the pseudo-labels.
Dense FixMatch significantly improves results compared to supervised learning using only labeled data, approaching its performance with 1/4 of the labeled samples.
arXiv Detail & Related papers (2022-10-18T15:02:51Z) - SimPLE: Similar Pseudo Label Exploitation for Semi-Supervised
Classification [24.386165255835063]
A common classification task situation is where one has a large amount of data available for training, but only a small portion is with class labels.
The goal of semi-supervised training, in this context, is to improve classification accuracy by leverage information from a large amount of unlabeled data.
We propose a novel unsupervised objective that focuses on the less studied relationship between the high confidence unlabeled data that are similar to each other.
Our proposed SimPLE algorithm shows significant performance gains over previous algorithms on CIFAR-100 and Mini-ImageNet, and is on par with the state-of-the-art methods
arXiv Detail & Related papers (2021-03-30T23:48:06Z) - CoMatch: Semi-supervised Learning with Contrastive Graph Regularization [86.84486065798735]
CoMatch is a new semi-supervised learning method that unifies dominant approaches.
It achieves state-of-the-art performance on multiple datasets.
arXiv Detail & Related papers (2020-11-23T02:54:57Z) - Federated Semi-Supervised Learning with Inter-Client Consistency &
Disjoint Learning [78.88007892742438]
We study two essential scenarios of Federated Semi-Supervised Learning (FSSL) based on the location of the labeled data.
We propose a novel method to tackle the problems, which we refer to as Federated Matching (FedMatch)
arXiv Detail & Related papers (2020-06-22T09:43:41Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.