Disentangling Sampling and Labeling Bias for Learning in Large-Output
Spaces
- URL: http://arxiv.org/abs/2105.05736v1
- Date: Wed, 12 May 2021 15:40:13 GMT
- Title: Disentangling Sampling and Labeling Bias for Learning in Large-Output
Spaces
- Authors: Ankit Singh Rawat, Aditya Krishna Menon, Wittawat Jitkrittum, Sadeep
Jayasumana, Felix X. Yu, Sashank Reddi, Sanjiv Kumar
- Abstract summary: We show that different negative sampling schemes implicitly trade-off performance on dominant versus rare labels.
We provide a unified means to explicitly tackle both sampling bias, arising from working with a subset of all labels, and labeling bias, which is inherent to the data due to label imbalance.
- Score: 64.23172847182109
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Negative sampling schemes enable efficient training given a large number of
classes, by offering a means to approximate a computationally expensive loss
function that takes all labels into account. In this paper, we present a new
connection between these schemes and loss modification techniques for
countering label imbalance. We show that different negative sampling schemes
implicitly trade-off performance on dominant versus rare labels. Further, we
provide a unified means to explicitly tackle both sampling bias, arising from
working with a subset of all labels, and labeling bias, which is inherent to
the data due to label imbalance. We empirically verify our findings on
long-tail classification and retrieval benchmarks.
Related papers
- Label Distribution Learning with Biased Annotations by Learning Multi-Label Representation [120.97262070068224]
Multi-label learning (MLL) has gained attention for its ability to represent real-world data.
Label Distribution Learning (LDL) faces challenges in collecting accurate label distributions.
arXiv Detail & Related papers (2025-02-03T09:04:03Z) - Improving Multi-Label Contrastive Learning by Leveraging Label Distribution [13.276821681189166]
In multi-label learning, leveraging contrastive learning to learn better representations faces a key challenge: selecting positive and negative samples.
Previous studies selected positive and negative samples based on the overlap between labels and used them for label-wise loss balancing.
We propose a novel method that improves multi-label contrastive learning through label distribution.
arXiv Detail & Related papers (2025-01-31T14:00:02Z) - Extracting Clean and Balanced Subset for Noisy Long-tailed Classification [66.47809135771698]
We develop a novel pseudo labeling method using class prototypes from the perspective of distribution matching.
By setting a manually-specific probability measure, we can reduce the side-effects of noisy and long-tailed data simultaneously.
Our method can extract this class-balanced subset with clean labels, which brings effective performance gains for long-tailed classification with label noise.
arXiv Detail & Related papers (2024-04-10T07:34:37Z) - CLAF: Contrastive Learning with Augmented Features for Imbalanced
Semi-Supervised Learning [40.5117833362268]
Semi-supervised learning and contrastive learning have been progressively combined to achieve better performances in popular applications.
One common manner is assigning pseudo-labels to unlabeled samples and selecting positive and negative samples from pseudo-labeled samples to apply contrastive learning.
We propose Contrastive Learning with Augmented Features (CLAF) to alleviate the scarcity of minority class samples in contrastive learning.
arXiv Detail & Related papers (2023-12-15T08:27:52Z) - Class-Distribution-Aware Pseudo Labeling for Semi-Supervised Multi-Label
Learning [97.88458953075205]
Pseudo-labeling has emerged as a popular and effective approach for utilizing unlabeled data.
This paper proposes a novel solution called Class-Aware Pseudo-Labeling (CAP) that performs pseudo-labeling in a class-aware manner.
arXiv Detail & Related papers (2023-05-04T12:52:18Z) - Dist-PU: Positive-Unlabeled Learning from a Label Distribution
Perspective [89.5370481649529]
We propose a label distribution perspective for PU learning in this paper.
Motivated by this, we propose to pursue the label distribution consistency between predicted and ground-truth label distributions.
Experiments on three benchmark datasets validate the effectiveness of the proposed method.
arXiv Detail & Related papers (2022-12-06T07:38:29Z) - Acknowledging the Unknown for Multi-label Learning with Single Positive
Labels [65.5889334964149]
Traditionally, all unannotated labels are assumed as negative labels in single positive multi-label learning (SPML)
We propose entropy-maximization (EM) loss to maximize the entropy of predicted probabilities for all unannotated labels.
Considering the positive-negative label imbalance of unannotated labels, we propose asymmetric pseudo-labeling (APL) with asymmetric-tolerance strategies and a self-paced procedure to provide more precise supervision.
arXiv Detail & Related papers (2022-03-30T11:43:59Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.