Label-Occurrence-Balanced Mixup for Long-tailed Recognition
- URL: http://arxiv.org/abs/2110.04964v1
- Date: Mon, 11 Oct 2021 02:22:02 GMT
- Title: Label-Occurrence-Balanced Mixup for Long-tailed Recognition
- Authors: Shaoyu Zhang, Chen Chen, Xiujuan Zhang, Silong Peng
- Abstract summary: We propose Label-Occurrence-Balanced Mixup to augment data while keeping the label occurrence for each class statistically balanced.
We test our method on several long-tailed vision and sound recognition benchmarks.
- Score: 6.482544017574614
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Mixup is a popular data augmentation method, with many variants subsequently
proposed. These methods mainly create new examples via convex combination of
random data pairs and their corresponding one-hot labels. However, most of them
adhere to a random sampling and mixing strategy, without considering the
frequency of label occurrence in the mixing process. When applying mixup to
long-tailed data, a label suppression issue arises, where the frequency of
label occurrence for each class is imbalanced and most of the new examples will
be completely or partially assigned with head labels. The suppression effect
may further aggravate the problem of data imbalance and lead to a poor
performance on tail classes. To address this problem, we propose
Label-Occurrence-Balanced Mixup to augment data while keeping the label
occurrence for each class statistically balanced. In a word, we employ two
independent class-balanced samplers to select data pairs and mix them to
generate new data. We test our method on several long-tailed vision and sound
recognition benchmarks. Experimental results show that our method significantly
promotes the adaptability of mixup method to imbalanced data and achieves
superior performance compared with state-of-the-art long-tailed learning
methods.
Related papers
- Extracting Clean and Balanced Subset for Noisy Long-tailed Classification [66.47809135771698]
We develop a novel pseudo labeling method using class prototypes from the perspective of distribution matching.
By setting a manually-specific probability measure, we can reduce the side-effects of noisy and long-tailed data simultaneously.
Our method can extract this class-balanced subset with clean labels, which brings effective performance gains for long-tailed classification with label noise.
arXiv Detail & Related papers (2024-04-10T07:34:37Z) - Multi-Label Adaptive Batch Selection by Highlighting Hard and Imbalanced Samples [9.360376286221943]
We introduce an adaptive batch selection algorithm tailored to multi-label deep learning models.
Our method converges faster and performs better than random batch selection.
arXiv Detail & Related papers (2024-03-27T02:00:18Z) - Learning with Imbalanced Noisy Data by Preventing Bias in Sample
Selection [82.43311784594384]
Real-world datasets contain not only noisy labels but also class imbalance.
We propose a simple yet effective method to address noisy labels in imbalanced datasets.
arXiv Detail & Related papers (2024-02-17T10:34:53Z) - Toward Robustness in Multi-label Classification: A Data Augmentation
Strategy against Imbalance and Noise [31.917931364881625]
Multi-label classification poses challenges due to imbalanced and noisy labels in training data.
We propose a unified data augmentation method, named BalanceMix, to address these challenges.
Our approach includes two samplers for imbalanced labels, generating minority-augmented instances with high diversity.
arXiv Detail & Related papers (2023-12-12T09:09:45Z) - PLM: Partial Label Masking for Imbalanced Multi-label Classification [59.68444804243782]
Neural networks trained on real-world datasets with long-tailed label distributions are biased towards frequent classes and perform poorly on infrequent classes.
We propose a method, Partial Label Masking (PLM), which utilizes this ratio during training.
Our method achieves strong performance when compared to existing methods on both multi-label (MultiMNIST and MSCOCO) and single-label (imbalanced CIFAR-10 and CIFAR-100) image classification datasets.
arXiv Detail & Related papers (2021-05-22T18:07:56Z) - Disentangling Sampling and Labeling Bias for Learning in Large-Output
Spaces [64.23172847182109]
We show that different negative sampling schemes implicitly trade-off performance on dominant versus rare labels.
We provide a unified means to explicitly tackle both sampling bias, arising from working with a subset of all labels, and labeling bias, which is inherent to the data due to label imbalance.
arXiv Detail & Related papers (2021-05-12T15:40:13Z) - Semi-supervised Long-tailed Recognition using Alternate Sampling [95.93760490301395]
Main challenges in long-tailed recognition come from the imbalanced data distribution and sample scarcity in its tail classes.
We propose a new recognition setting, namely semi-supervised long-tailed recognition.
We demonstrate significant accuracy improvements over other competitive methods on two datasets.
arXiv Detail & Related papers (2021-05-01T00:43:38Z) - A Novel Perspective for Positive-Unlabeled Learning via Noisy Labels [49.990938653249415]
This research presents a methodology that assigns initial pseudo-labels to unlabeled data which is used as noisy-labeled data, and trains a deep neural network using the noisy-labeled data.
Experimental results demonstrate that the proposed method significantly outperforms the state-of-the-art methods on several benchmark datasets.
arXiv Detail & Related papers (2021-03-08T11:46:02Z) - Multi-Label Sampling based on Local Label Imbalance [7.355362369511579]
Class imbalance is an inherent characteristic of multi-label data that hinders most multi-label learning methods.
Existing multi-label sampling approaches alleviate the global imbalance of multi-label datasets.
It is actually the imbalance level within the local neighbourhood of minority class examples that plays a key role in performance degradation.
arXiv Detail & Related papers (2020-05-07T04:14:23Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.