Rethinking the Value of Labels for Improving Class-Imbalanced Learning
- URL: http://arxiv.org/abs/2006.07529v2
- Date: Sat, 26 Sep 2020 20:05:13 GMT
- Title: Rethinking the Value of Labels for Improving Class-Imbalanced Learning
- Authors: Yuzhe Yang, Zhi Xu
- Abstract summary: Class-imbalanced learning can significantly benefit in both semi-supervised and self-supervised manners.
We argue that imbalanced labels are not useful always.
Our findings highlight the need to rethink the usage of imbalanced labels in realistic long-tailed tasks.
- Score: 20.953282288425118
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Real-world data often exhibits long-tailed distributions with heavy class
imbalance, posing great challenges for deep recognition models. We identify a
persisting dilemma on the value of labels in the context of imbalanced
learning: on the one hand, supervision from labels typically leads to better
results than its unsupervised counterparts; on the other hand, heavily
imbalanced data naturally incurs "label bias" in the classifier, where the
decision boundary can be drastically altered by the majority classes. In this
work, we systematically investigate these two facets of labels. We demonstrate,
theoretically and empirically, that class-imbalanced learning can significantly
benefit in both semi-supervised and self-supervised manners. Specifically, we
confirm that (1) positively, imbalanced labels are valuable: given more
unlabeled data, the original labels can be leveraged with the extra data to
reduce label bias in a semi-supervised manner, which greatly improves the final
classifier; (2) negatively however, we argue that imbalanced labels are not
useful always: classifiers that are first pre-trained in a self-supervised
manner consistently outperform their corresponding baselines. Extensive
experiments on large-scale imbalanced datasets verify our theoretically
grounded strategies, showing superior performance over previous
state-of-the-arts. Our intriguing findings highlight the need to rethink the
usage of imbalanced labels in realistic long-tailed tasks. Code is available at
https://github.com/YyzHarry/imbalanced-semi-self.
Related papers
- Generating Unbiased Pseudo-labels via a Theoretically Guaranteed
Chebyshev Constraint to Unify Semi-supervised Classification and Regression [57.17120203327993]
threshold-to-pseudo label process (T2L) in classification uses confidence to determine the quality of label.
In nature, regression also requires unbiased methods to generate high-quality labels.
We propose a theoretically guaranteed constraint for generating unbiased labels based on Chebyshev's inequality.
arXiv Detail & Related papers (2023-11-03T08:39:35Z) - Dist-PU: Positive-Unlabeled Learning from a Label Distribution
Perspective [89.5370481649529]
We propose a label distribution perspective for PU learning in this paper.
Motivated by this, we propose to pursue the label distribution consistency between predicted and ground-truth label distributions.
Experiments on three benchmark datasets validate the effectiveness of the proposed method.
arXiv Detail & Related papers (2022-12-06T07:38:29Z) - When Noisy Labels Meet Long Tail Dilemmas: A Representation Calibration
Method [40.25499257944916]
Real-world datasets are both noisily labeled and class-imbalanced.
We propose a representation calibration method RCAL.
We derive theoretical results to discuss the effectiveness of our representation calibration.
arXiv Detail & Related papers (2022-11-20T11:36:48Z) - Learning to Adapt Classifier for Imbalanced Semi-supervised Learning [38.434729550279116]
Pseudo-labeling has proven to be a promising semi-supervised learning (SSL) paradigm.
Existing pseudo-labeling methods commonly assume that the class distributions of training data are balanced.
In this work, we investigate pseudo-labeling under imbalanced semi-supervised setups.
arXiv Detail & Related papers (2022-07-28T02:15:47Z) - Debiased Pseudo Labeling in Self-Training [77.83549261035277]
Deep neural networks achieve remarkable performances on a wide range of tasks with the aid of large-scale labeled datasets.
To mitigate the requirement for labeled data, self-training is widely used in both academia and industry by pseudo labeling on readily-available unlabeled data.
We propose Debiased, in which the generation and utilization of pseudo labels are decoupled by two independent heads.
arXiv Detail & Related papers (2022-02-15T02:14:33Z) - Debiased Learning from Naturally Imbalanced Pseudo-Labels for Zero-Shot
and Semi-Supervised Learning [27.770473405635585]
This work studies the bias issue of pseudo-labeling, a natural phenomenon that widely occurs but often overlooked by prior research.
We observe heavy long-tailed pseudo-labels when a semi-supervised learning model FixMatch predicts labels on the unlabeled set even though the unlabeled data is curated to be balanced.
Without intervention, the training model inherits the bias from the pseudo-labels and end up being sub-optimal.
arXiv Detail & Related papers (2022-01-05T07:40:24Z) - Distribution-Aware Semantics-Oriented Pseudo-label for Imbalanced
Semi-Supervised Learning [80.05441565830726]
This paper addresses imbalanced semi-supervised learning, where heavily biased pseudo-labels can harm the model performance.
We propose a general pseudo-labeling framework to address the bias motivated by this observation.
We term the novel pseudo-labeling framework for imbalanced SSL as Distribution-Aware Semantics-Oriented (DASO) Pseudo-label.
arXiv Detail & Related papers (2021-06-10T11:58:25Z) - Disentangling Sampling and Labeling Bias for Learning in Large-Output
Spaces [64.23172847182109]
We show that different negative sampling schemes implicitly trade-off performance on dominant versus rare labels.
We provide a unified means to explicitly tackle both sampling bias, arising from working with a subset of all labels, and labeling bias, which is inherent to the data due to label imbalance.
arXiv Detail & Related papers (2021-05-12T15:40:13Z) - Fairness Constraints in Semi-supervised Learning [56.48626493765908]
We develop a framework for fair semi-supervised learning, which is formulated as an optimization problem.
We theoretically analyze the source of discrimination in semi-supervised learning via bias, variance and noise decomposition.
Our method is able to achieve fair semi-supervised learning, and reach a better trade-off between accuracy and fairness than fair supervised learning.
arXiv Detail & Related papers (2020-09-14T04:25:59Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.