Related papers: Knockoffs-SPR: Clean Sample Selection in Learning with Noisy Labels

Knockoffs-SPR: Clean Sample Selection in Learning with Noisy Labels

URL: http://arxiv.org/abs/2301.00545v4
Date: Wed, 29 Nov 2023 10:10:04 GMT
Title: Knockoffs-SPR: Clean Sample Selection in Learning with Noisy Labels
Authors: Yikai Wang, Yanwei Fu, and Xinwei Sun
Abstract summary: We propose a novel theoretically guaranteed clean sample selection framework for learning with noisy labels. Knockoffs-SPR can be regarded as a sample selection module for a standard supervised training pipeline. We further combine it with a semi-supervised algorithm to exploit the support of noisy data as unlabeled data.
Score: 56.81761908354718
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: A noisy training set usually leads to the degradation of the generalization and robustness of neural networks. In this paper, we propose a novel theoretically guaranteed clean sample selection framework for learning with noisy labels. Specifically, we first present a Scalable Penalized Regression (SPR) method, to model the linear relation between network features and one-hot labels. In SPR, the clean data are identified by the zero mean-shift parameters solved in the regression model. We theoretically show that SPR can recover clean data under some conditions. Under general scenarios, the conditions may be no longer satisfied; and some noisy data are falsely selected as clean data. To solve this problem, we propose a data-adaptive method for Scalable Penalized Regression with Knockoff filters (Knockoffs-SPR), which is provable to control the False-Selection-Rate (FSR) in the selected clean data. To improve the efficiency, we further present a split algorithm that divides the whole training set into small pieces that can be solved in parallel to make the framework scalable to large datasets. While Knockoffs-SPR can be regarded as a sample selection module for a standard supervised training pipeline, we further combine it with a semi-supervised algorithm to exploit the support of noisy data as unlabeled data. Experimental results on several benchmark datasets and real-world noisy datasets show the effectiveness of our framework and validate the theoretical results of Knockoffs-SPR. Our code and pre-trained models are available at https://github.com/Yikai-Wang/Knockoffs-SPR.

Related papers

Granular-ball Representation Learning for Deep CNN on Learning with Label Noise [14.082510085545582]
We propose a general granular-ball computing (GBC) module that can be embedded into a CNN model. In this study, we split the input samples as $gb$ samples at feature-level, each of which can correspond to multiple samples with varying numbers and share one single label. Experiments demonstrate that the proposed method can improve the robustness of CNN models with no additional data or optimization.
arXiv Detail & Related papers (2024-09-05T05:18:31Z)
Foster Adaptivity and Balance in Learning with Noisy Labels [26.309508654960354]
We propose a novel approach named textbfSED to deal with label noise in a textbfSelf-adaptivtextbfE and class-balancetextbfD manner. A mean-teacher model is then employed to correct labels of noisy samples. We additionally propose a self-adaptive and class-balanced sample re-weighting mechanism to assign different weights to detected noisy samples.
arXiv Detail & Related papers (2024-07-03T03:10:24Z)
Combating Label Noise With A General Surrogate Model For Sample Selection [84.61367781175984]
We propose to leverage the vision-language surrogate model CLIP to filter noisy samples automatically. We validate the effectiveness of our proposed method on both real-world and synthetic noisy datasets.
arXiv Detail & Related papers (2023-10-16T14:43:27Z)
Label-Retrieval-Augmented Diffusion Models for Learning from Noisy Labels [61.97359362447732]
Learning from noisy labels is an important and long-standing problem in machine learning for real applications. In this paper, we reformulate the label-noise problem from a generative-model perspective. Our model achieves new state-of-the-art (SOTA) results on all the standard real-world benchmark datasets.
arXiv Detail & Related papers (2023-05-31T03:01:36Z)
Class Prototype-based Cleaner for Label Noise Learning [73.007001454085]
Semi-supervised learning methods are current SOTA solutions to the noisy-label learning problem. We propose a simple yet effective solution, named textbfClass textbfPrototype-based label noise textbfCleaner.
arXiv Detail & Related papers (2022-12-21T04:56:41Z)
UNICON: Combating Label Noise Through Uniform Selection and Contrastive Learning [89.56465237941013]
We propose UNICON, a simple yet effective sample selection method which is robust to high label noise. We obtain an 11.4% improvement over the current state-of-the-art on CIFAR100 dataset with a 90% noise rate.
arXiv Detail & Related papers (2022-03-28T07:36:36Z)
Scalable Penalized Regression for Noise Detection in Learning with Noisy Labels [44.79124350922491]
We propose using a theoretically guaranteed noisy label detection framework to detect and remove noisy data for Learning with Noisy Labels (LNL) Specifically, we design a penalized regression to model the linear relation between network features and one-hot labels. To make the framework scalable to datasets that contain a large number of categories and training data, we propose a split algorithm to divide the whole training set into small pieces.
arXiv Detail & Related papers (2022-03-15T11:09:58Z)
Robust Training under Label Noise by Over-parameterization [41.03008228953627]
We propose a principled approach for robust training of over-parameterized deep networks in classification tasks where a proportion of training labels are corrupted. The main idea is yet very simple: label noise is sparse and incoherent with the network learned from clean data, so we model the noise and learn to separate it from the data. Remarkably, when trained using such a simple method in practice, we demonstrate state-of-the-art test accuracy against label noise on a variety of real datasets.
arXiv Detail & Related papers (2022-02-28T18:50:10Z)

This list is automatically generated from the titles and abstracts of the papers in this site.

This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.