On Non-Random Missing Labels in Semi-Supervised Learning
- URL: http://arxiv.org/abs/2206.14923v1
- Date: Wed, 29 Jun 2022 22:01:29 GMT
- Title: On Non-Random Missing Labels in Semi-Supervised Learning
- Authors: Xinting Hu, Yulei Niu, Chunyan Miao, Xian-Sheng Hua, Hanwang Zhang
- Abstract summary: Semi-Supervised Learning (SSL) is fundamentally a missing label problem.
We explicitly incorporate "class" into SSL.
Our method not only significantly outperforms existing baselines but also surpasses other label bias removal SSL methods.
- Score: 114.62655062520425
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Semi-Supervised Learning (SSL) is fundamentally a missing label problem, in
which the label Missing Not At Random (MNAR) problem is more realistic and
challenging, compared to the widely-adopted yet naive Missing Completely At
Random assumption where both labeled and unlabeled data share the same class
distribution. Different from existing SSL solutions that overlook the role of
"class" in causing the non-randomness, e.g., users are more likely to label
popular classes, we explicitly incorporate "class" into SSL. Our method is
three-fold: 1) We propose Class-Aware Propensity (CAP) that exploits the
unlabeled data to train an improved classifier using the biased labeled data.
2) To encourage rare class training, whose model is low-recall but
high-precision that discards too many pseudo-labeled data, we propose
Class-Aware Imputation (CAI) that dynamically decreases (or increases) the
pseudo-label assignment threshold for rare (or frequent) classes. 3) Overall,
we integrate CAP and CAI into a Class-Aware Doubly Robust (CADR) estimator for
training an unbiased SSL model. Under various MNAR settings and ablations, our
method not only significantly outperforms existing baselines but also surpasses
other label bias removal SSL methods. Please check our code at:
https://github.com/JoyHuYY1412/CADR-FixMatch.
Related papers
- Boosting Consistency in Dual Training for Long-Tailed Semi-Supervised Learning [49.07038093130949]
Long-tailed semi-supervised learning (LTSSL) algorithms assume that the class distributions of labeled and unlabeled data are almost identical.
We propose a new simple method that can effectively utilize unlabeled data from unknown class distributions.
We show that BOAT achieves state-of-the-art performance on a variety of standard LTSSL benchmarks.
arXiv Detail & Related papers (2024-06-19T03:35:26Z) - Exploring Vacant Classes in Label-Skewed Federated Learning [113.65301899666645]
Label skews, characterized by disparities in local label distribution across clients, pose a significant challenge in federated learning.
This paper introduces FedVLS, a novel approach to label-skewed federated learning that integrates vacant-class distillation and logit suppression simultaneously.
arXiv Detail & Related papers (2024-01-04T16:06:31Z) - Three Heads Are Better Than One: Complementary Experts for Long-Tailed Semi-supervised Learning [74.44500692632778]
We propose a novel method named ComPlementary Experts (CPE) to model various class distributions.
CPE achieves state-of-the-art performances on CIFAR-10-LT, CIFAR-100-LT, and STL-10-LT dataset benchmarks.
arXiv Detail & Related papers (2023-12-25T11:54:07Z) - Semi-Supervised Learning via Weight-aware Distillation under Class
Distribution Mismatch [15.57119122765309]
We propose a robust SSL framework called Weight-Aware Distillation (WAD) to alleviate the SSL error.
WAD captures adaptive weights and high-quality pseudo labels to target instances by exploring point mutual information (PMI) in representation space.
We prove that WAD has a tight upper bound of population risk under class distribution mismatch.
arXiv Detail & Related papers (2023-08-23T02:37:34Z) - Towards Semi-supervised Learning with Non-random Missing Labels [42.71454054383897]
Class transition tracking based Pseudo-Rectifying Guidance (PRG) is devised for label Missing Not At Random (MNAR)
PRG unifies the historical information of class distribution and class transitions caused by the pseudo-rectifying procedure.
We show the superior performance of PRG across a variety of MNAR scenarios, outperforming the latest SSL approaches.
arXiv Detail & Related papers (2023-08-17T09:09:36Z) - Semi-Supervised Learning with Multiple Imputations on Non-Random Missing
Labels [0.0]
Semi-Supervised Learning (SSL) is implemented when algorithms are trained on both labeled and unlabeled data.
This paper proposes two new methods of combining multiple imputation models to achieve higher accuracy and less bias.
arXiv Detail & Related papers (2023-08-15T04:09:53Z) - NorMatch: Matching Normalizing Flows with Discriminative Classifiers for
Semi-Supervised Learning [8.749830466953584]
Semi-Supervised Learning (SSL) aims to learn a model using a tiny labeled set and massive amounts of unlabeled data.
In this work we introduce a new framework for SSL named NorMatch.
We demonstrate, through numerical and visual results, that NorMatch achieves state-of-the-art performance on several datasets.
arXiv Detail & Related papers (2022-11-17T15:39:18Z) - Robust Deep Semi-Supervised Learning: A Brief Introduction [63.09703308309176]
Semi-supervised learning (SSL) aims to improve learning performance by leveraging unlabeled data when labels are insufficient.
SSL with deep models has proven to be successful on standard benchmark tasks.
However, they are still vulnerable to various robustness threats in real-world applications.
arXiv Detail & Related papers (2022-02-12T04:16:41Z) - Distribution-Aware Semantics-Oriented Pseudo-label for Imbalanced
Semi-Supervised Learning [80.05441565830726]
This paper addresses imbalanced semi-supervised learning, where heavily biased pseudo-labels can harm the model performance.
We propose a general pseudo-labeling framework to address the bias motivated by this observation.
We term the novel pseudo-labeling framework for imbalanced SSL as Distribution-Aware Semantics-Oriented (DASO) Pseudo-label.
arXiv Detail & Related papers (2021-06-10T11:58:25Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.