Data Stream Sampling with Fuzzy Task Boundaries and Noisy Labels
- URL: http://arxiv.org/abs/2404.04871v1
- Date: Sun, 7 Apr 2024 08:32:16 GMT
- Title: Data Stream Sampling with Fuzzy Task Boundaries and Noisy Labels
- Authors: Yu-Hsi Chen,
- Abstract summary: We introduce a novel sampling method called Noisy Test Debiasing (NTD) to mitigate noisy labels in evolving data streams.
NTD is straightforward to implement, making it feasible across various scenarios.
The results validate the efficacy of NTD for online continual learning in scenarios with noisy labels in data streams.
- Score: 0.03464344220266879
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: In the realm of continual learning, the presence of noisy labels within data streams represents a notable obstacle to model reliability and fairness. We focus on the data stream scenario outlined in pertinent literature, characterized by fuzzy task boundaries and noisy labels. To address this challenge, we introduce a novel and intuitive sampling method called Noisy Test Debiasing (NTD) to mitigate noisy labels in evolving data streams and establish a fair and robust continual learning algorithm. NTD is straightforward to implement, making it feasible across various scenarios. Our experiments benchmark four datasets, including two synthetic noise datasets (CIFAR10 and CIFAR100) and real-world noise datasets (mini-WebVision and Food-101N). The results validate the efficacy of NTD for online continual learning in scenarios with noisy labels in data streams. Compared to the previous leading approach, NTD achieves a training speedup enhancement over two times while maintaining or surpassing accuracy levels. Moreover, NTD utilizes less than one-fifth of the GPU memory resources compared to previous leading methods.
Related papers
- NAYER: Noisy Layer Data Generation for Efficient and Effective Data-free Knowledge Distillation [42.435293471992274]
Data-Free Knowledge Distillation (DFKD) has made significant recent strides by transferring knowledge from a teacher neural network to a student neural network without accessing the original data.
Existing approaches encounter a significant challenge when attempting to generate samples from random noise inputs, which inherently lack meaningful information.
We propose a novel Noisy Layer Generation method (NAYER) which relocates the random source from the input to a noisy layer and utilizes the meaningful constant label-text embedding (LTE) as the input.
arXiv Detail & Related papers (2023-09-30T05:19:10Z) - Towards Harnessing Feature Embedding for Robust Learning with Noisy
Labels [44.133307197696446]
The memorization effect of deep neural networks (DNNs) plays a pivotal role in recent label noise learning methods.
We propose a novel feature embedding-based method for deep learning with label noise, termed LabEl NoiseDilution (LEND)
arXiv Detail & Related papers (2022-06-27T02:45:09Z) - Robust Meta-learning with Sampling Noise and Label Noise via
Eigen-Reptile [78.1212767880785]
meta-learner is prone to overfitting since there are only a few available samples.
When handling the data with noisy labels, the meta-learner could be extremely sensitive to label noise.
We present Eigen-Reptile (ER) that updates the meta- parameters with the main direction of historical task-specific parameters.
arXiv Detail & Related papers (2022-06-04T08:48:02Z) - Online Continual Learning on a Contaminated Data Stream with Blurry Task
Boundaries [17.43350151320054]
A large body of continual learning (CL) methods assumes data streams with clean labels, and online learning scenarios under noisy data streams are yet underexplored.
We consider a more practical CL task setup of an online learning from blurry data stream with corrupted labels, where existing CL methods struggle.
We propose a novel strategy to manage and use the memory by a unified approach of label noise aware diverse sampling and robust learning with semi-supervised learning.
arXiv Detail & Related papers (2022-03-29T08:52:45Z) - Learning with Neighbor Consistency for Noisy Labels [69.83857578836769]
We present a method for learning from noisy labels that leverages similarities between training examples in feature space.
We evaluate our method on datasets evaluating both synthetic (CIFAR-10, CIFAR-100) and realistic (mini-WebVision, Clothing1M, mini-ImageNet-Red) noise.
arXiv Detail & Related papers (2022-02-04T15:46:27Z) - Learning with Noisy Labels Revisited: A Study Using Real-World Human
Annotations [54.400167806154535]
Existing research on learning with noisy labels mainly focuses on synthetic label noise.
This work presents two new benchmark datasets (CIFAR-10N, CIFAR-100N)
We show that real-world noisy labels follow an instance-dependent pattern rather than the classically adopted class-dependent ones.
arXiv Detail & Related papers (2021-10-22T22:42:11Z) - Learning from Noisy Labels via Dynamic Loss Thresholding [69.61904305229446]
We propose a novel method named Dynamic Loss Thresholding (DLT)
During the training process, DLT records the loss value of each sample and calculates dynamic loss thresholds.
Experiments on CIFAR-10/100 and Clothing1M demonstrate substantial improvements over recent state-of-the-art methods.
arXiv Detail & Related papers (2021-04-01T07:59:03Z) - Noise-resistant Deep Metric Learning with Ranking-based Instance
Selection [59.286567680389766]
We propose a noise-resistant training technique for DML, which we name Probabilistic Ranking-based Instance Selection with Memory (PRISM)
PRISM identifies noisy data in a minibatch using average similarity against image features extracted from several previous versions of the neural network.
To alleviate the high computational cost brought by the memory bank, we introduce an acceleration method that replaces individual data points with the class centers.
arXiv Detail & Related papers (2021-03-30T03:22:17Z) - Tackling Instance-Dependent Label Noise via a Universal Probabilistic
Model [80.91927573604438]
This paper proposes a simple yet universal probabilistic model, which explicitly relates noisy labels to their instances.
Experiments on datasets with both synthetic and real-world label noise verify that the proposed method yields significant improvements on robustness.
arXiv Detail & Related papers (2021-01-14T05:43:51Z) - Audio Tagging by Cross Filtering Noisy Labels [26.14064793686316]
We present a novel framework, named CrossFilter, to combat the noisy labels problem for audio tagging.
Our method achieves state-of-the-art performance and even surpasses the ensemble models.
arXiv Detail & Related papers (2020-07-16T07:55:04Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.