MultiMatch: Multihead Consistency Regularization Matching for Semi-Supervised Text Classification
- URL: http://arxiv.org/abs/2506.07801v2
- Date: Mon, 16 Jun 2025 20:28:26 GMT
- Title: MultiMatch: Multihead Consistency Regularization Matching for Semi-Supervised Text Classification
- Authors: Iustin Sirbu, Robert-Adrian Popovici, Cornelia Caragea, Stefan Trausan-Matu, Traian Rebedea,
- Abstract summary: We introduce MultiMatch, a novel semi-supervised learning (SSL) algorithm combining the paradigms of co-training and consistency regularization with pseudo-labeling.<n>At its core, MultiMatch features a three-fold pseudo-label weighting module designed for three key purposes.<n>This novel module enhances and unifies three existing techniques -- heads agreement from Multihead Co-training, self-adaptive thresholds from FreeMatch, and Average Pseudo-Margins from MarginMatch.
- Score: 42.62120305327092
- License: http://creativecommons.org/licenses/by-nc-sa/4.0/
- Abstract: We introduce MultiMatch, a novel semi-supervised learning (SSL) algorithm combining the paradigms of co-training and consistency regularization with pseudo-labeling. At its core, MultiMatch features a three-fold pseudo-label weighting module designed for three key purposes: selecting and filtering pseudo-labels based on head agreement and model confidence, and weighting them according to the perceived classification difficulty. This novel module enhances and unifies three existing techniques -- heads agreement from Multihead Co-training, self-adaptive thresholds from FreeMatch, and Average Pseudo-Margins from MarginMatch -- resulting in a holistic approach that improves robustness and performance in SSL settings. Experimental results on benchmark datasets highlight the superior performance of MultiMatch, achieving state-of-the-art results on 9 out of 10 setups from 5 natural language processing datasets and ranking first according to the Friedman test among 19 methods. Furthermore, MultiMatch demonstrates exceptional robustness in highly imbalanced settings, outperforming the second-best approach by 3.26% -- and data imbalance is a key factor for many text classification tasks.
Related papers
- Double-Stage Feature-Level Clustering-Based Mixture of Experts Framework [0.0]
This paper introduces the Double-stage Feature-level Clustering and Pseudo-labeling-based Mixture of Experts (DFCP-MoE) framework.<n>It consists of input feature extraction, feature-level clustering, and a computationally efficient pseudo-labeling strategy.<n>We propose a conditional end-to-end joint training method that improves expert specialization by training the MoE model on well-labeled, clustered inputs.
arXiv Detail & Related papers (2025-03-12T16:13:50Z) - RankMatch: A Novel Approach to Semi-Supervised Label Distribution
Learning Leveraging Inter-label Correlations [52.549807652527306]
This paper introduces RankMatch, an innovative approach for Semi-Supervised Label Distribution Learning (SSLDL)
RankMatch effectively utilizes a small number of labeled examples in conjunction with a larger quantity of unlabeled data.
We establish a theoretical generalization bound for RankMatch, and through extensive experiments, demonstrate its superiority in performance against existing SSLDL methods.
arXiv Detail & Related papers (2023-12-11T12:47:29Z) - JointMatch: A Unified Approach for Diverse and Collaborative
Pseudo-Labeling to Semi-Supervised Text Classification [65.268245109828]
Semi-supervised text classification (SSTC) has gained increasing attention due to its ability to leverage unlabeled data.
Existing approaches based on pseudo-labeling suffer from the issues of pseudo-label bias and error accumulation.
We propose JointMatch, a holistic approach for SSTC that addresses these challenges by unifying ideas from recent semi-supervised learning.
arXiv Detail & Related papers (2023-10-23T05:43:35Z) - SemiReward: A General Reward Model for Semi-supervised Learning [58.47299780978101]
Semi-supervised learning (SSL) has witnessed great progress with various improvements in the self-training framework with pseudo labeling.
Main challenge is how to distinguish high-quality pseudo labels against the confirmation bias.
We propose a Semi-supervised Reward framework (SemiReward) that predicts reward scores to evaluate and filter out high-quality pseudo labels.
arXiv Detail & Related papers (2023-10-04T17:56:41Z) - MarginMatch: Improving Semi-Supervised Learning with Pseudo-Margins [73.17295479535161]
MarginMatch is a new SSL approach combining consistency regularization and pseudo-labeling.
We analyze the behavior of the model on the pseudo-labeled examples as the training progresses to ensure low quality predictions are masked out.
We obtain an improvement in error rate over the state-of-the-art of 3.25% on CIFAR-100 with only 25 labels per class and of 3.78% on STL-10 using as few as 4 labels per class.
arXiv Detail & Related papers (2023-08-17T15:19:04Z) - Unifying Token and Span Level Supervisions for Few-Shot Sequence
Labeling [18.24907067631541]
Few-shot sequence labeling aims to identify novel classes based on only a few labeled samples.
We propose a Consistent Dual Adaptive Prototypical (CDAP) network for few-shot sequence labeling.
Our model achieves new state-of-the-art results on three benchmark datasets.
arXiv Detail & Related papers (2023-07-16T04:50:52Z) - Boosting Semi-Supervised Learning by Exploiting All Unlabeled Data [21.6350640726058]
Semi-supervised learning (SSL) has attracted enormous attention due to its vast potential of mitigating the dependence on large labeled datasets.
We propose two novel techniques: Entropy Meaning Loss (EML) and Adaptive Negative Learning (ANL)
We integrate these techniques with FixMatch, and develop a simple yet powerful framework called FullMatch.
arXiv Detail & Related papers (2023-03-20T12:44:11Z) - SoftMatch: Addressing the Quantity-Quality Trade-off in Semi-supervised
Learning [101.86916775218403]
This paper revisits the popular pseudo-labeling methods via a unified sample weighting formulation.
We propose SoftMatch to overcome the trade-off by maintaining both high quantity and high quality of pseudo-labels during training.
In experiments, SoftMatch shows substantial improvements across a wide variety of benchmarks, including image, text, and imbalanced classification.
arXiv Detail & Related papers (2023-01-26T03:53:25Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.