Intra-Cluster Mixup: An Effective Data Augmentation Technique for Complementary-Label Learning
- URL: http://arxiv.org/abs/2509.17971v1
- Date: Mon, 22 Sep 2025 16:20:41 GMT
- Title: Intra-Cluster Mixup: An Effective Data Augmentation Technique for Complementary-Label Learning
- Authors: Tan-Ha Mai, Hsuan-Tien Lin,
- Abstract summary: We investigate the challenges of complementary-label learning (CLL)<n>CLL is a form of weakly-supervised learning where models are trained with labels indicating classes to which instances do not belong.<n>We propose an improved technique called Intra-Cluster Mixup (ICM) which only synthesizes augmented data from nearby examples.
- Score: 7.601516977968089
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: In this paper, we investigate the challenges of complementary-label learning (CLL), a specialized form of weakly-supervised learning (WSL) where models are trained with labels indicating classes to which instances do not belong, rather than standard ordinary labels. This alternative supervision is appealing because collecting complementary labels is generally cheaper and less labor-intensive. Although most existing research in CLL emphasizes the development of novel loss functions, the potential of data augmentation in this domain remains largely underexplored. In this work, we uncover that the widely-used Mixup data augmentation technique is ineffective when directly applied to CLL. Through in-depth analysis, we identify that the complementary-label noise generated by Mixup negatively impacts the performance of CLL models. We then propose an improved technique called Intra-Cluster Mixup (ICM), which only synthesizes augmented data from nearby examples, to mitigate the noise effect. ICM carries the benefits of encouraging complementary label sharing of nearby examples, and leads to substantial performance improvements across synthetic and real-world labeled datasets. In particular, our wide spectrum of experimental results on both balanced and imbalanced CLL settings justifies the potential of ICM in allying with state-of-the-art CLL algorithms, achieving significant accuracy increases of 30% and 10% on MNIST and CIFAR datasets, respectively.
Related papers
- Robust Federated Learning against Noisy Clients via Masked Optimization [13.213042997655169]
In this study, we present a two-stage optimization framework, MaskedOptim, to address this intricate label noise problem.<n>The first stage is designed to facilitate the detection of noisy clients with higher label noise rates.<n>The second stage focuses on rectifying the labels of the noisy clients' data through an end-to-end label correction mechanism.
arXiv Detail & Related papers (2025-06-02T09:35:42Z) - Pseudo-label Refinement for Improving Self-Supervised Learning Systems [22.276126184466207]
Self-supervised learning systems use clustering-based pseudo-labels to provide supervision without the need for human annotations.
The noise in these pseudo-labels caused by the clustering methods poses a challenge to the learning process leading to degraded performance.
We propose a pseudo-label refinement algorithm to address this issue.
arXiv Detail & Related papers (2024-10-18T07:47:59Z) - Tackling Diverse Minorities in Imbalanced Classification [80.78227787608714]
Imbalanced datasets are commonly observed in various real-world applications, presenting significant challenges in training classifiers.
We propose generating synthetic samples iteratively by mixing data samples from both minority and majority classes.
We demonstrate the effectiveness of our proposed framework through extensive experiments conducted on seven publicly available benchmark datasets.
arXiv Detail & Related papers (2023-08-28T18:48:34Z) - Enhancing Label Sharing Efficiency in Complementary-Label Learning with
Label Augmentation [92.4959898591397]
We analyze the implicit sharing of complementary labels on nearby instances during training.
We propose a novel technique that enhances the sharing efficiency via complementary-label augmentation.
Our results confirm that complementary-label augmentation can systematically improve empirical performance over state-of-the-art CLL models.
arXiv Detail & Related papers (2023-05-15T04:43:14Z) - CLImage: Human-Annotated Datasets for Complementary-Label Learning [8.335164415521838]
Complementary-label learning (CLL) is a weakly-supervised learning paradigm that aims to train a multi-class classifier using only complementary labels.<n>Despite numerous algorithmic proposals for CLL, their practical applicability remains unverified for two reasons.<n>To gain insights into the real-world performance of CLL algorithms, we developed a protocol to collect complementary labels from human annotators.
arXiv Detail & Related papers (2023-05-15T01:48:53Z) - Class-Aware Contrastive Semi-Supervised Learning [51.205844705156046]
We propose a general method named Class-aware Contrastive Semi-Supervised Learning (CCSSL) to improve pseudo-label quality and enhance the model's robustness in the real-world setting.
Our proposed CCSSL has significant performance improvements over the state-of-the-art SSL methods on the standard datasets CIFAR100 and STL10.
arXiv Detail & Related papers (2022-03-04T12:18:23Z) - CAFE: Learning to Condense Dataset by Aligning Features [72.99394941348757]
We propose a novel scheme to Condense dataset by Aligning FEatures (CAFE)
At the heart of our approach is an effective strategy to align features from the real and synthetic data across various scales.
We validate the proposed CAFE across various datasets, and demonstrate that it generally outperforms the state of the art.
arXiv Detail & Related papers (2022-03-03T05:58:49Z) - In Defense of Pseudo-Labeling: An Uncertainty-Aware Pseudo-label
Selection Framework for Semi-Supervised Learning [53.1047775185362]
Pseudo-labeling (PL) is a general SSL approach that does not have this constraint but performs relatively poorly in its original formulation.
We argue that PL underperforms due to the erroneous high confidence predictions from poorly calibrated models.
We propose an uncertainty-aware pseudo-label selection (UPS) framework which improves pseudo labeling accuracy by drastically reducing the amount of noise encountered in the training process.
arXiv Detail & Related papers (2021-01-15T23:29:57Z) - Suppressing Mislabeled Data via Grouping and Self-Attention [60.14212694011875]
Deep networks achieve excellent results on large-scale clean data but degrade significantly when learning from noisy labels.
This paper proposes a conceptually simple yet efficient training block, termed as Attentive Feature Mixup (AFM)
It allows paying more attention to clean samples and less to mislabeled ones via sample interactions in small groups.
arXiv Detail & Related papers (2020-10-29T13:54:16Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.