CGMatch: A Different Perspective of Semi-supervised Learning
- URL: http://arxiv.org/abs/2503.02231v1
- Date: Tue, 04 Mar 2025 03:14:15 GMT
- Title: CGMatch: A Different Perspective of Semi-supervised Learning
- Authors: Bo Cheng, Jueqing Lu, Yuan Tian, Haifeng Zhao, Yi Chang, Lan Du,
- Abstract summary: Semi-supervised learning (SSL) has garnered significant attention due to its ability to leverage limited labeled data.<n>We argue that existing methods rely solely on the model's confidence to accurately assess the model's state.<n>We propose a novel SSL model called CGMatch, which, for the first time, incorporates a new metric known as Count-Gap.
- Score: 20.03126368452921
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Semi-supervised learning (SSL) has garnered significant attention due to its ability to leverage limited labeled data and a large amount of unlabeled data to improve model generalization performance. Recent approaches achieve impressive successes by combining ideas from both consistency regularization and pseudo-labeling. However, these methods tend to underperform in the more realistic situations with relatively scarce labeled data. We argue that this issue arises because existing methods rely solely on the model's confidence, making them challenging to accurately assess the model's state and identify unlabeled examples contributing to the training phase when supervision information is limited, especially during the early stages of model training. In this paper, we propose a novel SSL model called CGMatch, which, for the first time, incorporates a new metric known as Count-Gap (CG). We demonstrate that CG is effective in discovering unlabeled examples beneficial for model training. Along with confidence, a commonly used metric in SSL, we propose a fine-grained dynamic selection (FDS) strategy. This strategy dynamically divides the unlabeled dataset into three subsets with different characteristics: easy-to-learn set, ambiguous set, and hard-to-learn set. By selective filtering subsets, and applying corresponding regularization with selected subsets, we mitigate the negative impact of incorrect pseudo-labels on model optimization and generalization. Extensive experimental results on several common SSL benchmarks indicate the effectiveness of CGMatch especially when the labeled data are particularly limited. Source code is available at https://github.com/BoCheng-96/CGMatch.
Related papers
- Towards Micro-Action Recognition with Limited Annotations: An Asynchronous Pseudo Labeling and Training Approach [35.32024173141412]
We introduce the setting of Semi-Supervised MAR (SSMAR), where only a part of samples are labeled.
Traditional Semi-Supervised Learning (SSL) methods tend to overfit on inaccurate pseudo-labels, leading to error accumulation and degraded performance.
We propose Asynchronous Pseudo Labeling and Training (APLT), which explicitly separates the pseudo-labeling process from model training.
arXiv Detail & Related papers (2025-04-10T14:22:15Z) - SSB: Simple but Strong Baseline for Boosting Performance of Open-Set
Semi-Supervised Learning [106.46648817126984]
In this paper, we study the challenging and realistic open-set SSL setting.
The goal is to both correctly classify inliers and to detect outliers.
We find that inlier classification performance can be largely improved by incorporating high-confidence pseudo-labeled data.
arXiv Detail & Related papers (2023-11-17T15:14:40Z) - NorMatch: Matching Normalizing Flows with Discriminative Classifiers for
Semi-Supervised Learning [8.749830466953584]
Semi-Supervised Learning (SSL) aims to learn a model using a tiny labeled set and massive amounts of unlabeled data.
In this work we introduce a new framework for SSL named NorMatch.
We demonstrate, through numerical and visual results, that NorMatch achieves state-of-the-art performance on several datasets.
arXiv Detail & Related papers (2022-11-17T15:39:18Z) - MaxMatch: Semi-Supervised Learning with Worst-Case Consistency [149.03760479533855]
We propose a worst-case consistency regularization technique for semi-supervised learning (SSL)
We present a generalization bound for SSL consisting of the empirical loss terms observed on labeled and unlabeled training data separately.
Motivated by this bound, we derive an SSL objective that minimizes the largest inconsistency between an original unlabeled sample and its multiple augmented variants.
arXiv Detail & Related papers (2022-09-26T12:04:49Z) - Few-shot Learning via Dependency Maximization and Instance Discriminant
Analysis [21.8311401851523]
We study the few-shot learning problem, where a model learns to recognize new objects with extremely few labeled data per category.
We propose a simple approach to exploit unlabeled data accompanying the few-shot task for improving few-shot performance.
arXiv Detail & Related papers (2021-09-07T02:19:01Z) - Dash: Semi-Supervised Learning with Dynamic Thresholding [72.74339790209531]
We propose a semi-supervised learning (SSL) approach that uses unlabeled examples to train models.
Our proposed approach, Dash, enjoys its adaptivity in terms of unlabeled data selection.
arXiv Detail & Related papers (2021-09-01T23:52:29Z) - OpenMatch: Open-set Consistency Regularization for Semi-supervised
Learning with Outliers [71.08167292329028]
We propose a novel Open-set Semi-Supervised Learning (OSSL) approach called OpenMatch.
OpenMatch unifies FixMatch with novelty detection based on one-vs-all (OVA) classifiers.
It achieves state-of-the-art performance on three datasets, and even outperforms a fully supervised model in detecting outliers unseen in unlabeled data on CIFAR10.
arXiv Detail & Related papers (2021-05-28T23:57:15Z) - Adaptive Consistency Regularization for Semi-Supervised Transfer
Learning [31.66745229673066]
We consider semi-supervised learning and transfer learning jointly, leading to a more practical and competitive paradigm.
To better exploit the value of both pre-trained weights and unlabeled target examples, we introduce adaptive consistency regularization.
Our proposed adaptive consistency regularization outperforms state-of-the-art semi-supervised learning techniques such as Pseudo Label, Mean Teacher, and MixMatch.
arXiv Detail & Related papers (2021-03-03T05:46:39Z) - In Defense of Pseudo-Labeling: An Uncertainty-Aware Pseudo-label
Selection Framework for Semi-Supervised Learning [53.1047775185362]
Pseudo-labeling (PL) is a general SSL approach that does not have this constraint but performs relatively poorly in its original formulation.
We argue that PL underperforms due to the erroneous high confidence predictions from poorly calibrated models.
We propose an uncertainty-aware pseudo-label selection (UPS) framework which improves pseudo labeling accuracy by drastically reducing the amount of noise encountered in the training process.
arXiv Detail & Related papers (2021-01-15T23:29:57Z) - The Devil is in Classification: A Simple Framework for Long-tail Object
Detection and Instance Segmentation [93.17367076148348]
We investigate performance drop of the state-of-the-art two-stage instance segmentation model Mask R-CNN on the recent long-tail LVIS dataset.
We unveil that a major cause is the inaccurate classification of object proposals.
We propose a simple calibration framework to more effectively alleviate classification head bias with a bi-level class balanced sampling approach.
arXiv Detail & Related papers (2020-07-23T12:49:07Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.