No Regret Sample Selection with Noisy Labels
- URL: http://arxiv.org/abs/2003.03179v5
- Date: Sun, 4 Apr 2021 15:12:40 GMT
- Title: No Regret Sample Selection with Noisy Labels
- Authors: H. Song, N. Mitsuo, S. Uchida, D. Suehiro
- Abstract summary: Experimental results on multiple noisy-labeled datasets demonstrate that our sample selection strategy works effectively in the DNN training.
The proposed method achieves the best or the second-best performance among state-of-the-art methods, while requiring a significantly lower computational cost.
- Score: 0.0
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Deep neural networks (DNNs) suffer from noisy-labeled data because of the
risk of overfitting. To avoid the risk, in this paper, we propose a novel DNN
training method with sample selection based on adaptive k-set selection, which
selects k (< n) clean sample candidates from the whole n noisy training samples
at each epoch. It has a strong advantage of guaranteeing the performance of the
selection theoretically. Roughly speaking, a regret, which is defined by the
difference between the actual selection and the best selection, of the proposed
method is theoretically bounded, even though the best selection is unknown
until the end of all epochs. The experimental results on multiple noisy-labeled
datasets demonstrate that our sample selection strategy works effectively in
the DNN training; in fact, the proposed method achieved the best or the
second-best performance among state-of-the-art methods, while requiring a
significantly lower computational cost. The code is available at
https://github.com/songheony/TAkS.
Related papers
- BOND: Aligning LLMs with Best-of-N Distillation [63.254031574394965]
We propose Best-of-N Distillation (BOND), a novel RLHF algorithm that seeks to emulate Best-of-N but without its significant computational overhead at inference time.
Specifically, BOND is a distribution matching algorithm that forces the distribution of generations from the policy to get closer to the Best-of-N distribution.
We demonstrate the effectiveness of our approach and several design choices through experiments on abstractive summarization and Gemma models.
arXiv Detail & Related papers (2024-07-19T18:38:25Z) - BWS: Best Window Selection Based on Sample Scores for Data Pruning across Broad Ranges [12.248397169100784]
Data subset selection aims to find a smaller yet informative subset of a large dataset that can approximate the full-dataset training.
We introduce a universal and efficient data subset selection method, Best Window Selection (BWS), by proposing a method to choose the best window subset from samples ordered based on their difficulty scores.
arXiv Detail & Related papers (2024-06-05T08:33:09Z) - Combating Label Noise With A General Surrogate Model For Sample
Selection [84.61367781175984]
We propose to leverage the vision-language surrogate model CLIP to filter noisy samples automatically.
We validate the effectiveness of our proposed method on both real-world and synthetic noisy datasets.
arXiv Detail & Related papers (2023-10-16T14:43:27Z) - Towards Free Data Selection with General-Purpose Models [71.92151210413374]
A desirable data selection algorithm can efficiently choose the most informative samples to maximize the utility of limited annotation budgets.
Current approaches, represented by active learning methods, typically follow a cumbersome pipeline that iterates the time-consuming model training and batch data selection repeatedly.
FreeSel bypasses the heavy batch selection process, achieving a significant improvement in efficiency and being 530x faster than existing active learning methods.
arXiv Detail & Related papers (2023-09-29T15:50:14Z) - Knockoffs-SPR: Clean Sample Selection in Learning with Noisy Labels [56.81761908354718]
We propose a novel theoretically guaranteed clean sample selection framework for learning with noisy labels.
Knockoffs-SPR can be regarded as a sample selection module for a standard supervised training pipeline.
We further combine it with a semi-supervised algorithm to exploit the support of noisy data as unlabeled data.
arXiv Detail & Related papers (2023-01-02T07:13:28Z) - UNICON: Combating Label Noise Through Uniform Selection and Contrastive
Learning [89.56465237941013]
We propose UNICON, a simple yet effective sample selection method which is robust to high label noise.
We obtain an 11.4% improvement over the current state-of-the-art on CIFAR100 dataset with a 90% noise rate.
arXiv Detail & Related papers (2022-03-28T07:36:36Z) - Dash: Semi-Supervised Learning with Dynamic Thresholding [72.74339790209531]
We propose a semi-supervised learning (SSL) approach that uses unlabeled examples to train models.
Our proposed approach, Dash, enjoys its adaptivity in terms of unlabeled data selection.
arXiv Detail & Related papers (2021-09-01T23:52:29Z) - Consistent Relative Confidence and Label-Free Model Selection for
Convolutional Neural Networks [4.497097230665825]
This paper presents an approach to CNN model selection using only unlabeled data.
The effectiveness and efficiency of the presented method are demonstrated by extensive experimental studies based on datasets MNIST and FasionMNIST.
arXiv Detail & Related papers (2021-08-26T15:14:38Z) - Adaptive Sample Selection for Robust Learning under Label Noise [1.71982924656402]
Deep Neural Networks (DNNs) have been shown to be susceptible to memorization or overfitting in the presence of noisily labelled data.
A prominent class of algorithms rely on sample selection strategies, motivated by curriculum learning.
We propose a data-dependent, adaptive sample selection strategy that relies only on batch statistics.
arXiv Detail & Related papers (2021-06-29T12:10:58Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.