Combating Label Noise With A General Surrogate Model For Sample
Selection
- URL: http://arxiv.org/abs/2310.10463v1
- Date: Mon, 16 Oct 2023 14:43:27 GMT
- Title: Combating Label Noise With A General Surrogate Model For Sample
Selection
- Authors: Chao Liang, Linchao Zhu, Humphrey Shi, Yi Yang
- Abstract summary: We propose to leverage the vision-language surrogate model CLIP to filter noisy samples automatically.
We validate the effectiveness of our proposed method on both real-world and synthetic noisy datasets.
- Score: 84.61367781175984
- License: http://creativecommons.org/licenses/by-nc-sa/4.0/
- Abstract: Modern deep learning systems are data-hungry. Learning with web data is one
of the feasible solutions, but will introduce label noise inevitably, which can
hinder the performance of deep neural networks. Sample selection is an
effective way to deal with label noise. The key is to separate clean samples
based on some criterion. Previous methods pay more attention to the small loss
criterion where small-loss samples are regarded as clean ones. Nevertheless,
such a strategy relies on the learning dynamics of each data instance. Some
noisy samples are still memorized due to frequently occurring corrupted
learning patterns. To tackle this problem, a training-free surrogate model is
preferred, freeing from the effect of memorization. In this work, we propose to
leverage the vision-language surrogate model CLIP to filter noisy samples
automatically. CLIP brings external knowledge to facilitate the selection of
clean samples with its ability of text-image alignment. Furthermore, a margin
adaptive loss is designed to regularize the selection bias introduced by CLIP,
providing robustness to label noise. We validate the effectiveness of our
proposed method on both real-world and synthetic noisy datasets. Our method
achieves significant improvement without CLIP involved during the inference
stage.
Related papers
- CLIPCleaner: Cleaning Noisy Labels with CLIP [36.434849361479316]
textitCLIPCleaner is a zero-shot classifier for efficient, offline, clean sample selection.
textitCLIPCleaner offers a simple, single-step approach that achieves competitive or superior performance on benchmark datasets.
arXiv Detail & Related papers (2024-08-19T14:05:58Z) - Foster Adaptivity and Balance in Learning with Noisy Labels [26.309508654960354]
We propose a novel approach named textbfSED to deal with label noise in a textbfSelf-adaptivtextbfE and class-balancetextbfD manner.
A mean-teacher model is then employed to correct labels of noisy samples.
We additionally propose a self-adaptive and class-balanced sample re-weighting mechanism to assign different weights to detected noisy samples.
arXiv Detail & Related papers (2024-07-03T03:10:24Z) - Learning with Imbalanced Noisy Data by Preventing Bias in Sample
Selection [82.43311784594384]
Real-world datasets contain not only noisy labels but also class imbalance.
We propose a simple yet effective method to address noisy labels in imbalanced datasets.
arXiv Detail & Related papers (2024-02-17T10:34:53Z) - Learning with Noisy Labels Using Collaborative Sample Selection and
Contrastive Semi-Supervised Learning [76.00798972439004]
Collaborative Sample Selection (CSS) removes noisy samples from identified clean set.
We introduce a co-training mechanism with a contrastive loss in semi-supervised learning.
arXiv Detail & Related papers (2023-10-24T05:37:20Z) - PASS: Peer-Agreement based Sample Selection for training with Noisy Labels [16.283722126438125]
The prevalence of noisy-label samples poses a significant challenge in deep learning, inducing overfitting effects.
Current methodologies often rely on the small-loss hypothesis or feature-based selection to separate noisy- and clean-label samples.
We propose a new noisy-label detection method, termed Peer-Agreement based Sample Selection (PASS), to address this problem.
arXiv Detail & Related papers (2023-03-20T00:35:33Z) - Learning to Detect Noisy Labels Using Model-Based Features [16.681748918518075]
We propose Selection-Enhanced Noisy label Training (SENT)
SENT does not rely on meta learning while having the flexibility of being data-driven.
It improves performance over strong baselines under the settings of self-training and label corruption.
arXiv Detail & Related papers (2022-12-28T10:12:13Z) - Learning from Noisy Labels with Coarse-to-Fine Sample Credibility
Modeling [22.62790706276081]
Training deep neural network (DNN) with noisy labels is practically challenging.
Previous efforts tend to handle part or full data in a unified denoising flow.
We propose a coarse-to-fine robust learning method called CREMA to handle noisy data in a divide-and-conquer manner.
arXiv Detail & Related papers (2022-08-23T02:06:38Z) - UNICON: Combating Label Noise Through Uniform Selection and Contrastive
Learning [89.56465237941013]
We propose UNICON, a simple yet effective sample selection method which is robust to high label noise.
We obtain an 11.4% improvement over the current state-of-the-art on CIFAR100 dataset with a 90% noise rate.
arXiv Detail & Related papers (2022-03-28T07:36:36Z) - S3: Supervised Self-supervised Learning under Label Noise [53.02249460567745]
In this paper we address the problem of classification in the presence of label noise.
In the heart of our method is a sample selection mechanism that relies on the consistency between the annotated label of a sample and the distribution of the labels in its neighborhood in the feature space.
Our method significantly surpasses previous methods on both CIFARCIFAR100 with artificial noise and real-world noisy datasets such as WebVision and ANIMAL-10N.
arXiv Detail & Related papers (2021-11-22T15:49:20Z) - Salvage Reusable Samples from Noisy Data for Robust Learning [70.48919625304]
We propose a reusable sample selection and correction approach, termed as CRSSC, for coping with label noise in training deep FG models with web images.
Our key idea is to additionally identify and correct reusable samples, and then leverage them together with clean examples to update the networks.
arXiv Detail & Related papers (2020-08-06T02:07:21Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.