Toward Robustness in Multi-label Classification: A Data Augmentation
Strategy against Imbalance and Noise
- URL: http://arxiv.org/abs/2312.07087v1
- Date: Tue, 12 Dec 2023 09:09:45 GMT
- Title: Toward Robustness in Multi-label Classification: A Data Augmentation
Strategy against Imbalance and Noise
- Authors: Hwanjun Song and Minseok Kim and Jae-Gil Lee
- Abstract summary: Multi-label classification poses challenges due to imbalanced and noisy labels in training data.
We propose a unified data augmentation method, named BalanceMix, to address these challenges.
Our approach includes two samplers for imbalanced labels, generating minority-augmented instances with high diversity.
- Score: 31.917931364881625
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Multi-label classification poses challenges due to imbalanced and noisy
labels in training data. We propose a unified data augmentation method, named
BalanceMix, to address these challenges. Our approach includes two samplers for
imbalanced labels, generating minority-augmented instances with high diversity.
It also refines multi-labels at the label-wise granularity, categorizing noisy
labels as clean, re-labeled, or ambiguous for robust optimization. Extensive
experiments on three benchmark datasets demonstrate that BalanceMix outperforms
existing state-of-the-art methods. We release the code at
https://github.com/DISL-Lab/BalanceMix.
Related papers
- Extracting Clean and Balanced Subset for Noisy Long-tailed Classification [66.47809135771698]
We develop a novel pseudo labeling method using class prototypes from the perspective of distribution matching.
By setting a manually-specific probability measure, we can reduce the side-effects of noisy and long-tailed data simultaneously.
Our method can extract this class-balanced subset with clean labels, which brings effective performance gains for long-tailed classification with label noise.
arXiv Detail & Related papers (2024-04-10T07:34:37Z) - Learning with Imbalanced Noisy Data by Preventing Bias in Sample
Selection [82.43311784594384]
Real-world datasets contain not only noisy labels but also class imbalance.
We propose a simple yet effective method to address noisy labels in imbalanced datasets.
arXiv Detail & Related papers (2024-02-17T10:34:53Z) - SoftMatch: Addressing the Quantity-Quality Trade-off in Semi-supervised
Learning [101.86916775218403]
This paper revisits the popular pseudo-labeling methods via a unified sample weighting formulation.
We propose SoftMatch to overcome the trade-off by maintaining both high quantity and high quality of pseudo-labels during training.
In experiments, SoftMatch shows substantial improvements across a wide variety of benchmarks, including image, text, and imbalanced classification.
arXiv Detail & Related papers (2023-01-26T03:53:25Z) - S3: Supervised Self-supervised Learning under Label Noise [53.02249460567745]
In this paper we address the problem of classification in the presence of label noise.
In the heart of our method is a sample selection mechanism that relies on the consistency between the annotated label of a sample and the distribution of the labels in its neighborhood in the feature space.
Our method significantly surpasses previous methods on both CIFARCIFAR100 with artificial noise and real-world noisy datasets such as WebVision and ANIMAL-10N.
arXiv Detail & Related papers (2021-11-22T15:49:20Z) - Label-Occurrence-Balanced Mixup for Long-tailed Recognition [6.482544017574614]
We propose Label-Occurrence-Balanced Mixup to augment data while keeping the label occurrence for each class statistically balanced.
We test our method on several long-tailed vision and sound recognition benchmarks.
arXiv Detail & Related papers (2021-10-11T02:22:02Z) - Multi-Label Gold Asymmetric Loss Correction with Single-Label Regulators [6.129273021888717]
We propose a novel Gold Asymmetric Loss Correction with Single-Label Regulators (GALC-SLR) that operates robust against noisy labels.
GALC-SLR estimates the noise confusion matrix using single-label samples, then constructs an asymmetric loss correction via estimated confusion matrix to avoid overfitting to the noisy labels.
Empirical results show that our method outperforms the state-of-the-art original asymmetric loss multi-label classifier under all corruption levels.
arXiv Detail & Related papers (2021-08-04T12:57:29Z) - An Ensemble Noise-Robust K-fold Cross-Validation Selection Method for
Noisy Labels [0.9699640804685629]
Large-scale datasets tend to contain mislabeled samples that can be memorized by deep neural networks (DNNs)
We present Ensemble Noise-robust K-fold Cross-Validation Selection (E-NKCVS) to effectively select clean samples from noisy data.
We evaluate our approach on various image and text classification tasks where the labels have been manually corrupted with different noise ratios.
arXiv Detail & Related papers (2021-07-06T02:14:52Z) - Disentangling Sampling and Labeling Bias for Learning in Large-Output
Spaces [64.23172847182109]
We show that different negative sampling schemes implicitly trade-off performance on dominant versus rare labels.
We provide a unified means to explicitly tackle both sampling bias, arising from working with a subset of all labels, and labeling bias, which is inherent to the data due to label imbalance.
arXiv Detail & Related papers (2021-05-12T15:40:13Z) - Boosting Semi-Supervised Face Recognition with Noise Robustness [54.342992887966616]
This paper presents an effective solution to semi-supervised face recognition that is robust to the label noise aroused by the auto-labelling.
We develop a semi-supervised face recognition solution, named Noise Robust Learning-Labelling (NRoLL), which is based on the robust training ability empowered by GN.
arXiv Detail & Related papers (2021-05-10T14:43:11Z) - Multi-Label Sampling based on Local Label Imbalance [7.355362369511579]
Class imbalance is an inherent characteristic of multi-label data that hinders most multi-label learning methods.
Existing multi-label sampling approaches alleviate the global imbalance of multi-label datasets.
It is actually the imbalance level within the local neighbourhood of minority class examples that plays a key role in performance degradation.
arXiv Detail & Related papers (2020-05-07T04:14:23Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.