May the Forgetting Be with You: Alternate Replay for Learning with Noisy Labels
- URL: http://arxiv.org/abs/2408.14284v1
- Date: Mon, 26 Aug 2024 14:09:40 GMT
- Title: May the Forgetting Be with You: Alternate Replay for Learning with Noisy Labels
- Authors: Monica Millunzi, Lorenzo Bonicelli, Angelo Porrello, Jacopo Credi, Petter N. Kolm, Simone Calderara,
- Abstract summary: We introduce Alternate Experience Replay (AER), which takes advantage of forgetting to maintain a clear distinction between clean, complex, and noisy samples in the memory buffer.
We demonstrate the effectiveness of our approach in terms of both accuracy and purity of the obtained buffer, resulting in a remarkable average gain of 4.71% points in accuracy with respect to existing loss-based purification strategies.
- Score: 16.262555459431155
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Forgetting presents a significant challenge during incremental training, making it particularly demanding for contemporary AI systems to assimilate new knowledge in streaming data environments. To address this issue, most approaches in Continual Learning (CL) rely on the replay of a restricted buffer of past data. However, the presence of noise in real-world scenarios, where human annotation is constrained by time limitations or where data is automatically gathered from the web, frequently renders these strategies vulnerable. In this study, we address the problem of CL under Noisy Labels (CLN) by introducing Alternate Experience Replay (AER), which takes advantage of forgetting to maintain a clear distinction between clean, complex, and noisy samples in the memory buffer. The idea is that complex or mislabeled examples, which hardly fit the previously learned data distribution, are most likely to be forgotten. To grasp the benefits of such a separation, we equip AER with Asymmetric Balanced Sampling (ABS): a new sample selection strategy that prioritizes purity on the current task while retaining relevant samples from the past. Through extensive computational comparisons, we demonstrate the effectiveness of our approach in terms of both accuracy and purity of the obtained buffer, resulting in a remarkable average gain of 4.71% points in accuracy with respect to existing loss-based purification strategies. Code is available at https://github.com/aimagelab/mammoth.
Related papers
- STAMP: Outlier-Aware Test-Time Adaptation with Stable Memory Replay [76.06127233986663]
Test-time adaptation (TTA) aims to address the distribution shift between the training and test data with only unlabeled data at test time.
This paper pays attention to the problem that conducts both sample recognition and outlier rejection during inference while outliers exist.
We propose a new approach called STAble Memory rePlay (STAMP), which performs optimization over a stable memory bank instead of the risky mini-batch.
arXiv Detail & Related papers (2024-07-22T16:25:41Z) - Enabling On-Device Learning via Experience Replay with Efficient Dataset Condensation [15.915388740468815]
We propose an on-device framework that addresses the issue of identifying the most representative data to avoid significant information loss.
Specifically, to effectively handle unlabeled incoming data, we propose a pseudo-labeling technique designed for unlabeled on-device learning environments.
With a buffer capacity of just one sample per class, our method achieves an accuracy that outperforms the best existing baseline by 58.4%.
arXiv Detail & Related papers (2024-05-25T07:52:36Z) - Enhancing Consistency and Mitigating Bias: A Data Replay Approach for
Incremental Learning [100.7407460674153]
Deep learning systems are prone to catastrophic forgetting when learning from a sequence of tasks.
To mitigate the problem, a line of methods propose to replay the data of experienced tasks when learning new tasks.
However, it is not expected in practice considering the memory constraint or data privacy issue.
As a replacement, data-free data replay methods are proposed by inverting samples from the classification model.
arXiv Detail & Related papers (2024-01-12T12:51:12Z) - Noisy Correspondence Learning with Self-Reinforcing Errors Mitigation [63.180725016463974]
Cross-modal retrieval relies on well-matched large-scale datasets that are laborious in practice.
We introduce a novel noisy correspondence learning framework, namely textbfSelf-textbfReinforcing textbfErrors textbfMitigation (SREM)
arXiv Detail & Related papers (2023-12-27T09:03:43Z) - Combating Label Noise With A General Surrogate Model For Sample
Selection [84.61367781175984]
We propose to leverage the vision-language surrogate model CLIP to filter noisy samples automatically.
We validate the effectiveness of our proposed method on both real-world and synthetic noisy datasets.
arXiv Detail & Related papers (2023-10-16T14:43:27Z) - Late Stopping: Avoiding Confidently Learning from Mislabeled Examples [61.00103151680946]
We propose a new framework, Late Stopping, which leverages the intrinsic robust learning ability of DNNs through a prolonged training process.
We empirically observe that mislabeled and clean examples exhibit differences in the number of epochs required for them to be consistently and correctly classified.
Experimental results on benchmark-simulated and real-world noisy datasets demonstrate that the proposed method outperforms state-of-the-art counterparts.
arXiv Detail & Related papers (2023-08-26T12:43:25Z) - On the Effectiveness of Lipschitz-Driven Rehearsal in Continual Learning [17.179898279925155]
Repeated optimization on a small pool of data inevitably leads to tight and unstable decision boundaries.
We propose Lipschitz-DrivEn Rehearsal (LiDER), a surrogate objective that induces smoothness in the backbone network.
By means of extensive experiments, we show that applying LiDER delivers a stable performance gain to several state-of-the-art rehearsal CL methods.
arXiv Detail & Related papers (2022-10-12T17:45:13Z) - Few-Shot Class-Incremental Learning via Entropy-Regularized Data-Free
Replay [52.251188477192336]
Few-shot class-incremental learning (FSCIL) has been proposed aiming to enable a deep learning system to incrementally learn new classes with limited data.
We show through empirical results that adopting the data replay is surprisingly favorable.
We propose using data-free replay that can synthesize data by a generator without accessing real data.
arXiv Detail & Related papers (2022-07-22T17:30:51Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.