Related papers: Robust Learning under Hybrid Noise

Robust Learning under Hybrid Noise

URL: http://arxiv.org/abs/2407.04029v1
Date: Thu, 4 Jul 2024 16:13:25 GMT
Title: Robust Learning under Hybrid Noise
Authors: Yang Wei, Shuo Chen, Shanshan Ye, Bo Han, Chen Gong,
Abstract summary: We propose a novel unified learning framework called "Feature and Label Recovery" (FLR) to combat the hybrid noise from the perspective of data recovery.
Score: 24.36707245704713
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Feature noise and label noise are ubiquitous in practical scenarios, which pose great challenges for training a robust machine learning model. Most previous approaches usually deal with only a single problem of either feature noise or label noise. However, in real-world applications, hybrid noise, which contains both feature noise and label noise, is very common due to the unreliable data collection and annotation processes. Although some results have been achieved by a few representation learning based attempts, this issue is still far from being addressed with promising performance and guaranteed theoretical analyses. To address the challenge, we propose a novel unified learning framework called "Feature and Label Recovery" (FLR) to combat the hybrid noise from the perspective of data recovery, where we concurrently reconstruct both the feature matrix and the label matrix of input data. Specifically, the clean feature matrix is discovered by the low-rank approximation, and the ground-truth label matrix is embedded based on the recovered features with a nuclear norm regularization. Meanwhile, the feature noise and label noise are characterized by their respective adaptive matrix norms to satisfy the corresponding maximum likelihood. As this framework leads to a non-convex optimization problem, we develop the non-convex Alternating Direction Method of Multipliers (ADMM) with the convergence guarantee to solve our learning objective. We also provide the theoretical analysis to show that the generalization error of FLR can be upper-bounded in the presence of hybrid noise. Experimental results on several typical benchmark datasets clearly demonstrate the superiority of our proposed method over the state-of-the-art robust learning approaches for various noises.

Related papers

Automatically Identify and Rectify: Robust Deep Contrastive Multi-view Clustering in Noisy Scenarios [76.02688769599686]
We propose a novel multi-view clustering framework for the automatic identification and rectification of noisy data, termed AIRMVC.<n>Specifically, we reformulate noisy identification as an anomaly identification problem using GMM.<n>We then design a hybrid rectification strategy to mitigate the adverse effects of noisy data based on the identification results.
arXiv Detail & Related papers (2025-05-27T16:16:54Z)
Mitigating Instance-Dependent Label Noise: Integrating Self-Supervised Pretraining with Pseudo-Label Refinement [3.272177633069322]
Real-world datasets often contain noisy labels due to human error, ambiguity, or resource constraints during the annotation process. We propose a novel framework that combines self-supervised learning using SimCLR with iterative pseudo-label refinement. Our approach significantly outperforms several state-of-the-art methods, particularly under high noise conditions.
arXiv Detail & Related papers (2024-12-06T09:56:49Z)
Information limits and Thouless-Anderson-Palmer equations for spiked matrix models with structured noise [19.496063739638924]
We consider a saturate problem of Bayesian inference for a structured spiked model. We show how to predict the statistical limits using an efficient algorithm inspired by the theory of adaptive Thouless-Anderson-Palmer equations.
arXiv Detail & Related papers (2024-05-31T16:38:35Z)
Trusted Multi-view Learning with Label Noise [17.458306450909316]
Multi-view learning methods often focus on improving decision accuracy while neglecting the decision uncertainty. We propose a trusted multi-view noise refining method to solve this problem. We empirically compare TMNR with state-of-the-art trusted multi-view learning and label noise learning baselines on 5 publicly available datasets.
arXiv Detail & Related papers (2024-04-18T06:47:30Z)
SoftPatch: Unsupervised Anomaly Detection with Noisy Data [67.38948127630644]
This paper considers label-level noise in image sensory anomaly detection for the first time. We propose a memory-based unsupervised AD method, SoftPatch, which efficiently denoises the data at the patch level. Compared with existing methods, SoftPatch maintains a strong modeling ability of normal data and alleviates the overconfidence problem in coreset.
arXiv Detail & Related papers (2024-03-21T08:49:34Z)
Noisy Pair Corrector for Dense Retrieval [59.312376423104055]
We propose a novel approach called Noisy Pair Corrector (NPC) NPC consists of a detection module and a correction module. We conduct experiments on text-retrieval benchmarks Natural Question and TriviaQA, code-search benchmarks StaQC and SO-DS.
arXiv Detail & Related papers (2023-11-07T08:27:14Z)
Learning from Noisy Labels with Coarse-to-Fine Sample Credibility Modeling [22.62790706276081]
Training deep neural network (DNN) with noisy labels is practically challenging. Previous efforts tend to handle part or full data in a unified denoising flow. We propose a coarse-to-fine robust learning method called CREMA to handle noisy data in a divide-and-conquer manner.
arXiv Detail & Related papers (2022-08-23T02:06:38Z)
Robust Meta-learning with Sampling Noise and Label Noise via Eigen-Reptile [78.1212767880785]
meta-learner is prone to overfitting since there are only a few available samples. When handling the data with noisy labels, the meta-learner could be extremely sensitive to label noise. We present Eigen-Reptile (ER) that updates the meta- parameters with the main direction of historical task-specific parameters.
arXiv Detail & Related papers (2022-06-04T08:48:02Z)
Open-set Label Noise Can Improve Robustness Against Inherent Label Noise [27.885927200376386]
We show that open-set noisy labels can be non-toxic and even benefit the robustness against inherent noisy labels. We propose a simple yet effective regularization by introducing Open-set samples with Dynamic Noisy Labels (ODNL) into training.
arXiv Detail & Related papers (2021-06-21T07:15:50Z)
Training Classifiers that are Universally Robust to All Label Noise Levels [91.13870793906968]
Deep neural networks are prone to overfitting in the presence of label noise. We propose a distillation-based framework that incorporates a new subcategory of Positive-Unlabeled learning. Our framework generally outperforms at medium to high noise levels.
arXiv Detail & Related papers (2021-05-27T13:49:31Z)
Learning Noise Transition Matrix from Only Noisy Labels via Total Variation Regularization [88.91872713134342]
We propose a theoretically grounded method that can estimate the noise transition matrix and learn a classifier simultaneously. We show the effectiveness of the proposed method through experiments on benchmark and real-world datasets.
arXiv Detail & Related papers (2021-02-04T05:09:18Z)
Tackling Instance-Dependent Label Noise via a Universal Probabilistic Model [80.91927573604438]
This paper proposes a simple yet universal probabilistic model, which explicitly relates noisy labels to their instances. Experiments on datasets with both synthetic and real-world label noise verify that the proposed method yields significant improvements on robustness.
arXiv Detail & Related papers (2021-01-14T05:43:51Z)

This list is automatically generated from the titles and abstracts of the papers in this site.