Coordinated Sparse Recovery of Label Noise
- URL: http://arxiv.org/abs/2404.04800v1
- Date: Sun, 7 Apr 2024 03:41:45 GMT
- Title: Coordinated Sparse Recovery of Label Noise
- Authors: Yukun Yang, Naihao Wang, Haixin Yang, Ruirui Li,
- Abstract summary: This study focuses on robust classification tasks where the label noise is instance-dependent.
We propose a method called Coordinated Sparse Recovery (CSR)
CSR introduces a collaboration matrix and confidence weights to coordinate model predictions and noise recovery, reducing error leakage.
Based on CSR, this study designs a joint sample selection strategy and constructs a comprehensive and powerful learning framework called CSR+.
- Score: 2.9495895055806804
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Label noise is a common issue in real-world datasets that inevitably impacts the generalization of models. This study focuses on robust classification tasks where the label noise is instance-dependent. Estimating the transition matrix accurately in this task is challenging, and methods based on sample selection often exhibit confirmation bias to varying degrees. Sparse over-parameterized training (SOP) has been theoretically effective in estimating and recovering label noise, offering a novel solution for noise-label learning. However, this study empirically observes and verifies a technical flaw of SOP: the lack of coordination between model predictions and noise recovery leads to increased generalization error. To address this, we propose a method called Coordinated Sparse Recovery (CSR). CSR introduces a collaboration matrix and confidence weights to coordinate model predictions and noise recovery, reducing error leakage. Based on CSR, this study designs a joint sample selection strategy and constructs a comprehensive and powerful learning framework called CSR+. CSR+ significantly reduces confirmation bias, especially for datasets with more classes and a high proportion of instance-specific noise. Experimental results on simulated and real-world noisy datasets demonstrate that both CSR and CSR+ achieve outstanding performance compared to methods at the same level.
Related papers
- Conformal-in-the-Loop for Learning with Imbalanced Noisy Data [5.69777817429044]
Class imbalance and label noise are pervasive in large-scale datasets.
Much of machine learning research assumes well-labeled, balanced data, which rarely reflects real world conditions.
We propose Conformal-in-the-Loop (CitL), a novel training framework that addresses both challenges with a conformal prediction-based approach.
arXiv Detail & Related papers (2024-11-04T17:09:58Z) - Robust Learning under Hybrid Noise [24.36707245704713]
We propose a novel unified learning framework called "Feature and Label Recovery" (FLR) to combat the hybrid noise from the perspective of data recovery.
arXiv Detail & Related papers (2024-07-04T16:13:25Z) - Noisy Correspondence Learning with Self-Reinforcing Errors Mitigation [63.180725016463974]
Cross-modal retrieval relies on well-matched large-scale datasets that are laborious in practice.
We introduce a novel noisy correspondence learning framework, namely textbfSelf-textbfReinforcing textbfErrors textbfMitigation (SREM)
arXiv Detail & Related papers (2023-12-27T09:03:43Z) - Analyze the Robustness of Classifiers under Label Noise [5.708964539699851]
Label noise in supervised learning, characterized by erroneous or imprecise labels, significantly impairs model performance.
This research focuses on the increasingly pertinent issue of label noise's impact on practical applications.
arXiv Detail & Related papers (2023-12-12T13:51:25Z) - Combating Label Noise With A General Surrogate Model For Sample
Selection [84.61367781175984]
We propose to leverage the vision-language surrogate model CLIP to filter noisy samples automatically.
We validate the effectiveness of our proposed method on both real-world and synthetic noisy datasets.
arXiv Detail & Related papers (2023-10-16T14:43:27Z) - Tackling Diverse Minorities in Imbalanced Classification [80.78227787608714]
Imbalanced datasets are commonly observed in various real-world applications, presenting significant challenges in training classifiers.
We propose generating synthetic samples iteratively by mixing data samples from both minority and majority classes.
We demonstrate the effectiveness of our proposed framework through extensive experiments conducted on seven publicly available benchmark datasets.
arXiv Detail & Related papers (2023-08-28T18:48:34Z) - Dynamic Residual Classifier for Class Incremental Learning [4.02487511510606]
With imbalanced sample numbers between old and new classes, the learning can be biased.
Existing CIL methods exploit the longtailed (LT) recognition techniques, e.g., the adjusted losses and the data re-sampling methods.
A novel Dynamic Residual adaptation (DRC) is proposed to handle this challenging scenario.
arXiv Detail & Related papers (2023-08-25T11:07:11Z) - Boosting Differentiable Causal Discovery via Adaptive Sample Reweighting [62.23057729112182]
Differentiable score-based causal discovery methods learn a directed acyclic graph from observational data.
We propose a model-agnostic framework to boost causal discovery performance by dynamically learning the adaptive weights for the Reweighted Score function, ReScore.
arXiv Detail & Related papers (2023-03-06T14:49:59Z) - GEC: A Unified Framework for Interactive Decision Making in MDP, POMDP,
and Beyond [101.5329678997916]
We study sample efficient reinforcement learning (RL) under the general framework of interactive decision making.
We propose a novel complexity measure, generalized eluder coefficient (GEC), which characterizes the fundamental tradeoff between exploration and exploitation.
We show that RL problems with low GEC form a remarkably rich class, which subsumes low Bellman eluder dimension problems, bilinear class, low witness rank problems, PO-bilinear class, and generalized regular PSR.
arXiv Detail & Related papers (2022-11-03T16:42:40Z) - Tackling Instance-Dependent Label Noise via a Universal Probabilistic
Model [80.91927573604438]
This paper proposes a simple yet universal probabilistic model, which explicitly relates noisy labels to their instances.
Experiments on datasets with both synthetic and real-world label noise verify that the proposed method yields significant improvements on robustness.
arXiv Detail & Related papers (2021-01-14T05:43:51Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.