Related papers: Late Stopping: Avoiding Confidently Learning from Mislabeled Examples

Late Stopping: Avoiding Confidently Learning from Mislabeled Examples

URL: http://arxiv.org/abs/2308.13862v1
Date: Sat, 26 Aug 2023 12:43:25 GMT
Title: Late Stopping: Avoiding Confidently Learning from Mislabeled Examples
Authors: Suqin Yuan, Lei Feng, Tongliang Liu
Abstract summary: We propose a new framework, Late Stopping, which leverages the intrinsic robust learning ability of DNNs through a prolonged training process. We empirically observe that mislabeled and clean examples exhibit differences in the number of epochs required for them to be consistently and correctly classified. Experimental results on benchmark-simulated and real-world noisy datasets demonstrate that the proposed method outperforms state-of-the-art counterparts.
Score: 61.00103151680946
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Sample selection is a prevalent method in learning with noisy labels, where small-loss data are typically considered as correctly labeled data. However, this method may not effectively identify clean hard examples with large losses, which are critical for achieving the model's close-to-optimal generalization performance. In this paper, we propose a new framework, Late Stopping, which leverages the intrinsic robust learning ability of DNNs through a prolonged training process. Specifically, Late Stopping gradually shrinks the noisy dataset by removing high-probability mislabeled examples while retaining the majority of clean hard examples in the training set throughout the learning process. We empirically observe that mislabeled and clean examples exhibit differences in the number of epochs required for them to be consistently and correctly classified, and thus high-probability mislabeled examples can be removed. Experimental results on benchmark-simulated and real-world noisy datasets demonstrate that the proposed method outperforms state-of-the-art counterparts.

Related papers

Enhanced Sample Selection with Confidence Tracking: Identifying Correctly Labeled yet Hard-to-Learn Samples in Noisy Data [18.111971239860836]
We propose a novel sample selection method for image classification in the presence of noisy labels. Our goal is to accurately distinguish correctly labeled yet hard-to-learn samples from mislabeled ones. Our method functions as a plug-and-play component that can be seamlessly integrated into existing sample selection techniques.
arXiv Detail & Related papers (2025-04-24T12:07:14Z)
Enhancing Sample Selection by Cutting Mislabeled Easy Examples [62.13094877228772]
We show that mislabeled examples correctly predicted by the model early in the training process are particularly harmful to model performance. We propose Early Cutting, which employs the model's later training state to re-select the confident subset identified early in training.
arXiv Detail & Related papers (2025-02-12T09:12:45Z)
Learning with Imbalanced Noisy Data by Preventing Bias in Sample Selection [82.43311784594384]
Real-world datasets contain not only noisy labels but also class imbalance. We propose a simple yet effective method to address noisy labels in imbalanced datasets.
arXiv Detail & Related papers (2024-02-17T10:34:53Z)
Robust Positive-Unlabeled Learning via Noise Negative Sample Self-correction [48.929877651182885]
Learning from positive and unlabeled data is known as positive-unlabeled (PU) learning in literature. We propose a new robust PU learning method with a training strategy motivated by the nature of human learning.
arXiv Detail & Related papers (2023-08-01T04:34:52Z)
Learning with Noisy Labels by Adaptive Gradient-Based Outlier Removal [4.71154003227418]
We propose AGRA: a new method for learning with noisy labels by using Adaptive GRAdient-based outlier removal. By comparing the aggregated gradient of a batch of samples and an individual example gradient, our method dynamically decides whether a corresponding example is helpful for the model. Extensive evaluation on several datasets demonstrates AGRA's effectiveness.
arXiv Detail & Related papers (2023-06-07T15:10:01Z)
Temporal Output Discrepancy for Loss Estimation-based Active Learning [65.93767110342502]
We present a novel deep active learning approach that queries the oracle for data annotation when the unlabeled sample is believed to incorporate high loss. Our approach achieves superior performances than the state-of-the-art active learning methods on image classification and semantic segmentation tasks.
arXiv Detail & Related papers (2022-12-20T19:29:37Z)
Split-PU: Hardness-aware Training Strategy for Positive-Unlabeled Learning [42.26185670834855]
Positive-Unlabeled (PU) learning aims to learn a model with rare positive samples and abundant unlabeled samples. This paper focuses on improving the commonly-used nnPU with a novel training pipeline.
arXiv Detail & Related papers (2022-11-30T05:48:31Z)
Dash: Semi-Supervised Learning with Dynamic Thresholding [72.74339790209531]
We propose a semi-supervised learning (SSL) approach that uses unlabeled examples to train models. Our proposed approach, Dash, enjoys its adaptivity in terms of unlabeled data selection.
arXiv Detail & Related papers (2021-09-01T23:52:29Z)
Robust and On-the-fly Dataset Denoising for Image Classification [72.10311040730815]
On-the-fly Data Denoising (ODD) is robust to mislabeled examples, while introducing almost zero computational overhead compared to standard training. ODD is able to achieve state-of-the-art results on a wide range of datasets including real-world ones such as WebVision and Clothing1M.
arXiv Detail & Related papers (2020-03-24T03:59:26Z)

This list is automatically generated from the titles and abstracts of the papers in this site.

This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.