Ensemble Learning with Manifold-Based Data Splitting for Noisy Label
Correction
- URL: http://arxiv.org/abs/2103.07641v1
- Date: Sat, 13 Mar 2021 07:24:58 GMT
- Title: Ensemble Learning with Manifold-Based Data Splitting for Noisy Label
Correction
- Authors: Hao-Chiang Shao, Hsin-Chieh Wang, Weng-Tai Su, and Chia-Wen Lin
- Abstract summary: noisy labels in training data can significantly degrade a model's generalization performance.
We propose an ensemble learning method to correct noisy labels by exploiting the local structures of feature manifold.
Our experiments on real-world noisy label datasets demonstrate the superiority of the proposed method over existing state-of-the-arts.
- Score: 20.401661156102897
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Label noise in training data can significantly degrade a model's
generalization performance for supervised learning tasks. Here we focus on the
problem that noisy labels are primarily mislabeled samples, which tend to be
concentrated near decision boundaries, rather than uniformly distributed, and
whose features should be equivocal. To address the problem, we propose an
ensemble learning method to correct noisy labels by exploiting the local
structures of feature manifolds. Different from typical ensemble strategies
that increase the prediction diversity among sub-models via certain loss terms,
our method trains sub-models on disjoint subsets, each being a union of the
nearest-neighbors of randomly selected seed samples on the data manifold. As a
result, each sub-model can learn a coarse representation of the data manifold
along with a corresponding graph. Moreover, only a limited number of sub-models
will be affected by locally-concentrated noisy labels. The constructed graphs
are used to suggest a series of label correction candidates, and accordingly,
our method derives label correction results by voting down inconsistent
suggestions. Our experiments on real-world noisy label datasets demonstrate the
superiority of the proposed method over existing state-of-the-arts.
Related papers
- Label Noise Robustness for Domain-Agnostic Fair Corrections via Nearest Neighbors Label Spreading [28.69917037694153]
We propose a drop-in correction for label noise in last-layer retraining.
Our proposed approach uses label spreading on a latent nearest neighbors graph and has minimal computational overhead.
arXiv Detail & Related papers (2024-06-13T20:00:06Z) - Inaccurate Label Distribution Learning with Dependency Noise [52.08553913094809]
We introduce the Dependent Noise-based Inaccurate Label Distribution Learning (DN-ILDL) framework to tackle the challenges posed by noise in label distribution learning.
We show that DN-ILDL effectively addresses the ILDL problem and outperforms existing LDL methods.
arXiv Detail & Related papers (2024-05-26T07:58:07Z) - Extracting Clean and Balanced Subset for Noisy Long-tailed Classification [66.47809135771698]
We develop a novel pseudo labeling method using class prototypes from the perspective of distribution matching.
By setting a manually-specific probability measure, we can reduce the side-effects of noisy and long-tailed data simultaneously.
Our method can extract this class-balanced subset with clean labels, which brings effective performance gains for long-tailed classification with label noise.
arXiv Detail & Related papers (2024-04-10T07:34:37Z) - Learning with Complementary Labels Revisited: The Selected-Completely-at-Random Setting Is More Practical [66.57396042747706]
Complementary-label learning is a weakly supervised learning problem.
We propose a consistent approach that does not rely on the uniform distribution assumption.
We find that complementary-label learning can be expressed as a set of negative-unlabeled binary classification problems.
arXiv Detail & Related papers (2023-11-27T02:59:17Z) - Manifold DivideMix: A Semi-Supervised Contrastive Learning Framework for
Severe Label Noise [4.90148689564172]
Real-world datasets contain noisy label samples that have no semantic relevance to any class in the dataset.
Most state-of-the-art methods leverage ID labeled noisy samples as unlabeled data for semi-supervised learning.
We propose incorporating the information from all the training data by leveraging the benefits of self-supervised training.
arXiv Detail & Related papers (2023-08-13T23:33:33Z) - Co-Learning Meets Stitch-Up for Noisy Multi-label Visual Recognition [70.00984078351927]
This paper focuses on reducing noise based on some inherent properties of multi-label classification and long-tailed learning under noisy cases.
We propose a Stitch-Up augmentation to synthesize a cleaner sample, which directly reduces multi-label noise.
A Heterogeneous Co-Learning framework is further designed to leverage the inconsistency between long-tailed and balanced distributions.
arXiv Detail & Related papers (2023-07-03T09:20:28Z) - Neighborhood Collective Estimation for Noisy Label Identification and
Correction [92.20697827784426]
Learning with noisy labels (LNL) aims at designing strategies to improve model performance and generalization by mitigating the effects of model overfitting to noisy labels.
Recent advances employ the predicted label distributions of individual samples to perform noise verification and noisy label correction, easily giving rise to confirmation bias.
We propose Neighborhood Collective Estimation, in which the predictive reliability of a candidate sample is re-estimated by contrasting it against its feature-space nearest neighbors.
arXiv Detail & Related papers (2022-08-05T14:47:22Z) - Resolving label uncertainty with implicit posterior models [71.62113762278963]
We propose a method for jointly inferring labels across a collection of data samples.
By implicitly assuming the existence of a generative model for which a differentiable predictor is the posterior, we derive a training objective that allows learning under weak beliefs.
arXiv Detail & Related papers (2022-02-28T18:09:44Z) - CrowdTeacher: Robust Co-teaching with Noisy Answers & Sample-specific
Perturbations for Tabular Data [8.276156981100364]
Co-teaching methods have shown promising improvements for computer vision problems with noisy labels.
Our model, CrowdTeacher, uses the idea that robustness in the input space model can improve the perturbation of the classifier for noisy labels.
We showcase the boost in predictive power attained using CrowdTeacher for both synthetic and real datasets.
arXiv Detail & Related papers (2021-03-31T15:09:38Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.