Disentangling Sampling and Labeling Bias for Learning in Large-Output
Spaces
- URL: http://arxiv.org/abs/2105.05736v1
- Date: Wed, 12 May 2021 15:40:13 GMT
- Title: Disentangling Sampling and Labeling Bias for Learning in Large-Output
Spaces
- Authors: Ankit Singh Rawat, Aditya Krishna Menon, Wittawat Jitkrittum, Sadeep
Jayasumana, Felix X. Yu, Sashank Reddi, Sanjiv Kumar
- Abstract summary: We show that different negative sampling schemes implicitly trade-off performance on dominant versus rare labels.
We provide a unified means to explicitly tackle both sampling bias, arising from working with a subset of all labels, and labeling bias, which is inherent to the data due to label imbalance.
- Score: 64.23172847182109
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Negative sampling schemes enable efficient training given a large number of
classes, by offering a means to approximate a computationally expensive loss
function that takes all labels into account. In this paper, we present a new
connection between these schemes and loss modification techniques for
countering label imbalance. We show that different negative sampling schemes
implicitly trade-off performance on dominant versus rare labels. Further, we
provide a unified means to explicitly tackle both sampling bias, arising from
working with a subset of all labels, and labeling bias, which is inherent to
the data due to label imbalance. We empirically verify our findings on
long-tail classification and retrieval benchmarks.
Related papers
- Extracting Clean and Balanced Subset for Noisy Long-tailed Classification [66.47809135771698]
We develop a novel pseudo labeling method using class prototypes from the perspective of distribution matching.
By setting a manually-specific probability measure, we can reduce the side-effects of noisy and long-tailed data simultaneously.
Our method can extract this class-balanced subset with clean labels, which brings effective performance gains for long-tailed classification with label noise.
arXiv Detail & Related papers (2024-04-10T07:34:37Z) - CLAF: Contrastive Learning with Augmented Features for Imbalanced
Semi-Supervised Learning [40.5117833362268]
Semi-supervised learning and contrastive learning have been progressively combined to achieve better performances in popular applications.
One common manner is assigning pseudo-labels to unlabeled samples and selecting positive and negative samples from pseudo-labeled samples to apply contrastive learning.
We propose Contrastive Learning with Augmented Features (CLAF) to alleviate the scarcity of minority class samples in contrastive learning.
arXiv Detail & Related papers (2023-12-15T08:27:52Z) - Soft Curriculum for Learning Conditional GANs with Noisy-Labeled and
Uncurated Unlabeled Data [70.25049762295193]
We introduce a novel conditional image generation framework that accepts noisy-labeled and uncurated data during training.
We propose soft curriculum learning, which assigns instance-wise weights for adversarial training while assigning new labels for unlabeled data.
Our experiments show that our approach outperforms existing semi-supervised and label-noise robust methods in terms of both quantitative and qualitative performance.
arXiv Detail & Related papers (2023-07-17T08:31:59Z) - Class-Distribution-Aware Pseudo Labeling for Semi-Supervised Multi-Label
Learning [97.88458953075205]
Pseudo-labeling has emerged as a popular and effective approach for utilizing unlabeled data.
This paper proposes a novel solution called Class-Aware Pseudo-Labeling (CAP) that performs pseudo-labeling in a class-aware manner.
arXiv Detail & Related papers (2023-05-04T12:52:18Z) - Dist-PU: Positive-Unlabeled Learning from a Label Distribution
Perspective [89.5370481649529]
We propose a label distribution perspective for PU learning in this paper.
Motivated by this, we propose to pursue the label distribution consistency between predicted and ground-truth label distributions.
Experiments on three benchmark datasets validate the effectiveness of the proposed method.
arXiv Detail & Related papers (2022-12-06T07:38:29Z) - Acknowledging the Unknown for Multi-label Learning with Single Positive
Labels [65.5889334964149]
Traditionally, all unannotated labels are assumed as negative labels in single positive multi-label learning (SPML)
We propose entropy-maximization (EM) loss to maximize the entropy of predicted probabilities for all unannotated labels.
Considering the positive-negative label imbalance of unannotated labels, we propose asymmetric pseudo-labeling (APL) with asymmetric-tolerance strategies and a self-paced procedure to provide more precise supervision.
arXiv Detail & Related papers (2022-03-30T11:43:59Z) - Label Noise in Adversarial Training: A Novel Perspective to Study Robust
Overfitting [45.58217741522973]
We show that label noise exists in adversarial training.
Such label noise is due to the mismatch between the true label distribution of adversarial examples and the label inherited from clean examples.
We propose a method to automatically calibrate the label to address the label noise and robust overfitting.
arXiv Detail & Related papers (2021-10-07T01:15:06Z) - Rethinking the Value of Labels for Improving Class-Imbalanced Learning [20.953282288425118]
Class-imbalanced learning can significantly benefit in both semi-supervised and self-supervised manners.
We argue that imbalanced labels are not useful always.
Our findings highlight the need to rethink the usage of imbalanced labels in realistic long-tailed tasks.
arXiv Detail & Related papers (2020-06-13T01:35:58Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.