Related papers: Disentangling Sampling and Labeling Bias for Learning in Large-Output Spaces

Disentangling Sampling and Labeling Bias for Learning in Large-Output Spaces

URL: http://arxiv.org/abs/2105.05736v1
Date: Wed, 12 May 2021 15:40:13 GMT
Title: Disentangling Sampling and Labeling Bias for Learning in Large-Output Spaces
Authors: Ankit Singh Rawat, Aditya Krishna Menon, Wittawat Jitkrittum, Sadeep Jayasumana, Felix X. Yu, Sashank Reddi, Sanjiv Kumar
Abstract summary: We show that different negative sampling schemes implicitly trade-off performance on dominant versus rare labels. We provide a unified means to explicitly tackle both sampling bias, arising from working with a subset of all labels, and labeling bias, which is inherent to the data due to label imbalance.
Score: 64.23172847182109
License: http://creativecommons.org/licenses/by/4.0/
Abstract: Negative sampling schemes enable efficient training given a large number of classes, by offering a means to approximate a computationally expensive loss function that takes all labels into account. In this paper, we present a new connection between these schemes and loss modification techniques for countering label imbalance. We show that different negative sampling schemes implicitly trade-off performance on dominant versus rare labels. Further, we provide a unified means to explicitly tackle both sampling bias, arising from working with a subset of all labels, and labeling bias, which is inherent to the data due to label imbalance. We empirically verify our findings on long-tail classification and retrieval benchmarks.

Related papers

Label Distribution Learning with Biased Annotations by Learning Multi-Label Representation [120.97262070068224]
Multi-label learning (MLL) has gained attention for its ability to represent real-world data. Label Distribution Learning (LDL) faces challenges in collecting accurate label distributions.
arXiv Detail & Related papers (2025-02-03T09:04:03Z)
Improving Multi-Label Contrastive Learning by Leveraging Label Distribution [13.276821681189166]
In multi-label learning, leveraging contrastive learning to learn better representations faces a key challenge: selecting positive and negative samples. Previous studies selected positive and negative samples based on the overlap between labels and used them for label-wise loss balancing. We propose a novel method that improves multi-label contrastive learning through label distribution.
arXiv Detail & Related papers (2025-01-31T14:00:02Z)
Extracting Clean and Balanced Subset for Noisy Long-tailed Classification [66.47809135771698]
We develop a novel pseudo labeling method using class prototypes from the perspective of distribution matching. By setting a manually-specific probability measure, we can reduce the side-effects of noisy and long-tailed data simultaneously. Our method can extract this class-balanced subset with clean labels, which brings effective performance gains for long-tailed classification with label noise.
arXiv Detail & Related papers (2024-04-10T07:34:37Z)
CLAF: Contrastive Learning with Augmented Features for Imbalanced Semi-Supervised Learning [40.5117833362268]
Semi-supervised learning and contrastive learning have been progressively combined to achieve better performances in popular applications. One common manner is assigning pseudo-labels to unlabeled samples and selecting positive and negative samples from pseudo-labeled samples to apply contrastive learning. We propose Contrastive Learning with Augmented Features (CLAF) to alleviate the scarcity of minority class samples in contrastive learning.
arXiv Detail & Related papers (2023-12-15T08:27:52Z)
Soft Curriculum for Learning Conditional GANs with Noisy-Labeled and Uncurated Unlabeled Data [70.25049762295193]
We introduce a novel conditional image generation framework that accepts noisy-labeled and uncurated data during training. We propose soft curriculum learning, which assigns instance-wise weights for adversarial training while assigning new labels for unlabeled data. Our experiments show that our approach outperforms existing semi-supervised and label-noise robust methods in terms of both quantitative and qualitative performance.
arXiv Detail & Related papers (2023-07-17T08:31:59Z)
Class-Distribution-Aware Pseudo Labeling for Semi-Supervised Multi-Label Learning [97.88458953075205]
Pseudo-labeling has emerged as a popular and effective approach for utilizing unlabeled data. This paper proposes a novel solution called Class-Aware Pseudo-Labeling (CAP) that performs pseudo-labeling in a class-aware manner.
arXiv Detail & Related papers (2023-05-04T12:52:18Z)
Dist-PU: Positive-Unlabeled Learning from a Label Distribution Perspective [89.5370481649529]
We propose a label distribution perspective for PU learning in this paper. Motivated by this, we propose to pursue the label distribution consistency between predicted and ground-truth label distributions. Experiments on three benchmark datasets validate the effectiveness of the proposed method.
arXiv Detail & Related papers (2022-12-06T07:38:29Z)
Acknowledging the Unknown for Multi-label Learning with Single Positive Labels [65.5889334964149]
Traditionally, all unannotated labels are assumed as negative labels in single positive multi-label learning (SPML) We propose entropy-maximization (EM) loss to maximize the entropy of predicted probabilities for all unannotated labels. Considering the positive-negative label imbalance of unannotated labels, we propose asymmetric pseudo-labeling (APL) with asymmetric-tolerance strategies and a self-paced procedure to provide more precise supervision.
arXiv Detail & Related papers (2022-03-30T11:43:59Z)
Label Noise in Adversarial Training: A Novel Perspective to Study Robust Overfitting [45.58217741522973]
We show that label noise exists in adversarial training. Such label noise is due to the mismatch between the true label distribution of adversarial examples and the label inherited from clean examples. We propose a method to automatically calibrate the label to address the label noise and robust overfitting.
arXiv Detail & Related papers (2021-10-07T01:15:06Z)
Rethinking the Value of Labels for Improving Class-Imbalanced Learning [20.953282288425118]
Class-imbalanced learning can significantly benefit in both semi-supervised and self-supervised manners. We argue that imbalanced labels are not useful always. Our findings highlight the need to rethink the usage of imbalanced labels in realistic long-tailed tasks.
arXiv Detail & Related papers (2020-06-13T01:35:58Z)

This list is automatically generated from the titles and abstracts of the papers in this site.