Debiased Pseudo Labeling in Self-Training
- URL: http://arxiv.org/abs/2202.07136v1
- Date: Tue, 15 Feb 2022 02:14:33 GMT
- Title: Debiased Pseudo Labeling in Self-Training
- Authors: Baixu Chen, Junguang Jiang, Ximei Wang, Jianmin Wang, Mingsheng Long
- Abstract summary: Deep neural networks achieve remarkable performances on a wide range of tasks with the aid of large-scale labeled datasets.
To mitigate the requirement for labeled data, self-training is widely used in both academia and industry by pseudo labeling on readily-available unlabeled data.
We propose Debiased, in which the generation and utilization of pseudo labels are decoupled by two independent heads.
- Score: 77.83549261035277
- License: http://creativecommons.org/licenses/by-nc-nd/4.0/
- Abstract: Deep neural networks achieve remarkable performances on a wide range of tasks
with the aid of large-scale labeled datasets. However, large-scale annotations
are time-consuming and labor-exhaustive to obtain on realistic tasks. To
mitigate the requirement for labeled data, self-training is widely used in both
academia and industry by pseudo labeling on readily-available unlabeled data.
Despite its popularity, pseudo labeling is well-believed to be unreliable and
often leads to training instability. Our experimental studies further reveal
that the performance of self-training is biased due to data sampling,
pre-trained models, and training strategies, especially the inappropriate
utilization of pseudo labels. To this end, we propose Debiased, in which the
generation and utilization of pseudo labels are decoupled by two independent
heads. To further improve the quality of pseudo labels, we introduce a
worst-case estimation of pseudo labeling and seamlessly optimize the
representations to avoid the worst-case. Extensive experiments justify that the
proposed Debiased not only yields an average improvement of $14.4$\% against
state-of-the-art algorithms on $11$ tasks (covering generic object recognition,
fine-grained object recognition, texture classification, and scene
classification) but also helps stabilize training and balance performance
across classes.
Related papers
- Virtual Category Learning: A Semi-Supervised Learning Method for Dense
Prediction with Extremely Limited Labels [63.16824565919966]
This paper proposes to use confusing samples proactively without label correction.
A Virtual Category (VC) is assigned to each confusing sample in such a way that it can safely contribute to the model optimisation.
Our intriguing findings highlight the usage of VC learning in dense vision tasks.
arXiv Detail & Related papers (2023-12-02T16:23:52Z) - Boosting Semi-Supervised Learning by bridging high and low-confidence
predictions [4.18804572788063]
Pseudo-labeling is a crucial technique in semi-supervised learning (SSL)
We propose a new method called ReFixMatch, which aims to utilize all of the unlabeled data during training.
arXiv Detail & Related papers (2023-08-15T00:27:18Z) - Soft Curriculum for Learning Conditional GANs with Noisy-Labeled and
Uncurated Unlabeled Data [70.25049762295193]
We introduce a novel conditional image generation framework that accepts noisy-labeled and uncurated data during training.
We propose soft curriculum learning, which assigns instance-wise weights for adversarial training while assigning new labels for unlabeled data.
Our experiments show that our approach outperforms existing semi-supervised and label-noise robust methods in terms of both quantitative and qualitative performance.
arXiv Detail & Related papers (2023-07-17T08:31:59Z) - Doubly Robust Self-Training [46.168395767948965]
We introduce doubly robust self-training, a novel semi-supervised algorithm.
We demonstrate the superiority of the doubly robust loss over the standard self-training baseline.
arXiv Detail & Related papers (2023-06-01T00:57:16Z) - SoftMatch: Addressing the Quantity-Quality Trade-off in Semi-supervised
Learning [101.86916775218403]
This paper revisits the popular pseudo-labeling methods via a unified sample weighting formulation.
We propose SoftMatch to overcome the trade-off by maintaining both high quantity and high quality of pseudo-labels during training.
In experiments, SoftMatch shows substantial improvements across a wide variety of benchmarks, including image, text, and imbalanced classification.
arXiv Detail & Related papers (2023-01-26T03:53:25Z) - Debiased Learning from Naturally Imbalanced Pseudo-Labels for Zero-Shot
and Semi-Supervised Learning [27.770473405635585]
This work studies the bias issue of pseudo-labeling, a natural phenomenon that widely occurs but often overlooked by prior research.
We observe heavy long-tailed pseudo-labels when a semi-supervised learning model FixMatch predicts labels on the unlabeled set even though the unlabeled data is curated to be balanced.
Without intervention, the training model inherits the bias from the pseudo-labels and end up being sub-optimal.
arXiv Detail & Related papers (2022-01-05T07:40:24Z) - Self-Tuning for Data-Efficient Deep Learning [75.34320911480008]
Self-Tuning is a novel approach to enable data-efficient deep learning.
It unifies the exploration of labeled and unlabeled data and the transfer of a pre-trained model.
It outperforms its SSL and TL counterparts on five tasks by sharp margins.
arXiv Detail & Related papers (2021-02-25T14:56:19Z) - Semi-supervised Relation Extraction via Incremental Meta Self-Training [56.633441255756075]
Semi-Supervised Relation Extraction methods aim to leverage unlabeled data in addition to learning from limited samples.
Existing self-training methods suffer from the gradual drift problem, where noisy pseudo labels on unlabeled data are incorporated during training.
We propose a method called MetaSRE, where a Relation Label Generation Network generates quality assessment on pseudo labels by (meta) learning from the successful and failed attempts on Relation Classification Network as an additional meta-objective.
arXiv Detail & Related papers (2020-10-06T03:54:11Z) - Rethinking the Value of Labels for Improving Class-Imbalanced Learning [20.953282288425118]
Class-imbalanced learning can significantly benefit in both semi-supervised and self-supervised manners.
We argue that imbalanced labels are not useful always.
Our findings highlight the need to rethink the usage of imbalanced labels in realistic long-tailed tasks.
arXiv Detail & Related papers (2020-06-13T01:35:58Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.