ZeroFlow: Overcoming Catastrophic Forgetting is Easier than You Think
- URL: http://arxiv.org/abs/2501.01045v4
- Date: Fri, 06 Jun 2025 07:00:41 GMT
- Title: ZeroFlow: Overcoming Catastrophic Forgetting is Easier than You Think
- Authors: Tao Feng, Wei Li, Didi Zhu, Hangjie Yuan, Wendi Zheng, Dan Zhang, Jie Tang,
- Abstract summary: We introduce ZeroFlow, the first benchmark designed to evaluate gradient-free optimization algorithms for overcoming forgetting.<n>Our results show that forward passes alone can be sufficient to mitigate forgetting.<n>We propose new enhancements that further improve forgetting resistance using only forward passes.
- Score: 22.333776599402754
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Backpropagation provides a generalized configuration for overcoming catastrophic forgetting. Optimizers such as SGD and Adam are commonly used for weight updates in continual learning and continual pre-training. However, access to gradient information is not always feasible in practice due to black-box APIs, hardware constraints, or non-differentiable systems, a challenge we refer to as the gradient bans. To bridge this gap, we introduce ZeroFlow, the first benchmark designed to evaluate gradient-free optimization algorithms for overcoming forgetting. ZeroFlow examines a suite of forward pass-based methods across various algorithms, forgetting scenarios, and datasets. Our results show that forward passes alone can be sufficient to mitigate forgetting. We uncover novel optimization principles that highlight the potential of forward pass-based methods in mitigating forgetting, managing task conflicts, and reducing memory demands. Additionally, we propose new enhancements that further improve forgetting resistance using only forward passes. This work provides essential tools and insights to advance the development of forward-pass-based methods for continual learning.
Related papers
- Restoring Forgotten Knowledge in Non-Exemplar Class Incremental Learning through Test-Time Semantic Evolution [11.50324946279326]
In Non-exemplar Class Incremental Learning (NECIL), forgetting arises because old classes are inaccessible.<n>We propose RoSE, which is a test-time semantic drift compensation framework.<n>We evaluate RoSE on CIFAR-100, TinyImageNet, and ImageNet100 datasets, under both cold-start and warm-start settings.
arXiv Detail & Related papers (2025-03-21T02:02:35Z) - Centaur: Robust End-to-End Autonomous Driving with Test-Time Training [84.78837437133234]
We propose Centaur, which updates a planner's behavior via test-time training without relying on hand-engineered rules or cost functions.
We develop a novel uncertainty measure, called Cluster Entropy, which is simple, interpretable, and compatible with state-of-the-art planning algorithms.
arXiv Detail & Related papers (2025-03-14T17:59:41Z) - Sparser Training for On-Device Recommendation Systems [50.74019319100728]
We propose SparseRec, a lightweight embedding method based on Dynamic Sparse Training (DST)
It avoids dense gradients during backpropagation by sampling a subset of important vectors.
arXiv Detail & Related papers (2024-11-19T03:48:48Z) - Stepping Forward on the Last Mile [8.756033984943178]
We propose a series of algorithm enhancements that further reduce the memory footprint, and the accuracy gap compared to backpropagation.
Our results demonstrate that on the last mile of model customization on edge devices, training with fixed-point forward gradients is a feasible and practical approach.
arXiv Detail & Related papers (2024-11-06T16:33:21Z) - Unified Gradient-Based Machine Unlearning with Remain Geometry Enhancement [29.675650285351768]
Machine unlearning (MU) has emerged to enhance the privacy and trustworthiness of deep neural networks.
Approximate MU is a practical method for large-scale models.
We propose a fast-slow parameter update strategy to implicitly approximate the up-to-date salient unlearning direction.
arXiv Detail & Related papers (2024-09-29T15:17:33Z) - An Effective Dynamic Gradient Calibration Method for Continual Learning [11.555822066922508]
Continual learning (CL) is a fundamental topic in machine learning, where the goal is to train a model with continuously incoming data and tasks.
Due to the memory limit, we cannot store all the historical data, and therefore confront the catastrophic forgetting'' problem.
We develop an effective algorithm to calibrate the gradient in each updating step of the model.
arXiv Detail & Related papers (2024-07-30T16:30:09Z) - Enhancing Consistency and Mitigating Bias: A Data Replay Approach for
Incremental Learning [100.7407460674153]
Deep learning systems are prone to catastrophic forgetting when learning from a sequence of tasks.
To mitigate the problem, a line of methods propose to replay the data of experienced tasks when learning new tasks.
However, it is not expected in practice considering the memory constraint or data privacy issue.
As a replacement, data-free data replay methods are proposed by inverting samples from the classification model.
arXiv Detail & Related papers (2024-01-12T12:51:12Z) - Rethinking PGD Attack: Is Sign Function Necessary? [131.6894310945647]
We present a theoretical analysis of how such sign-based update algorithm influences step-wise attack performance.
We propose a new raw gradient descent (RGD) algorithm that eliminates the use of sign.
The effectiveness of the proposed RGD algorithm has been demonstrated extensively in experiments.
arXiv Detail & Related papers (2023-12-03T02:26:58Z) - Learning Large-scale Neural Fields via Context Pruned Meta-Learning [60.93679437452872]
We introduce an efficient optimization-based meta-learning technique for large-scale neural field training.
We show how gradient re-scaling at meta-test time allows the learning of extremely high-quality neural fields.
Our framework is model-agnostic, intuitive, straightforward to implement, and shows significant reconstruction improvements for a wide range of signals.
arXiv Detail & Related papers (2023-02-01T17:32:16Z) - Improving information retention in large scale online continual learning [99.73847522194549]
Online continual learning aims to adapt efficiently to new data while retaining existing knowledge.
Recent work suggests that information retention remains a problem in large scale OCL even when the replay buffer is unlimited.
We propose using a moving average family of methods to improve optimization for non-stationary objectives.
arXiv Detail & Related papers (2022-10-12T16:59:43Z) - Scaling Forward Gradient With Local Losses [117.22685584919756]
Forward learning is a biologically plausible alternative to backprop for learning deep neural networks.
We show that it is possible to substantially reduce the variance of the forward gradient by applying perturbations to activations rather than weights.
Our approach matches backprop on MNIST and CIFAR-10 and significantly outperforms previously proposed backprop-free algorithms on ImageNet.
arXiv Detail & Related papers (2022-10-07T03:52:27Z) - Pre-Train Your Loss: Easy Bayesian Transfer Learning with Informative
Priors [59.93972277761501]
We show that we can learn highly informative posteriors from the source task, through supervised or self-supervised approaches.
This simple modular approach enables significant performance gains and more data-efficient learning on a variety of downstream classification and segmentation tasks.
arXiv Detail & Related papers (2022-05-20T16:19:30Z) - Deep Equilibrium Optical Flow Estimation [80.80992684796566]
Recent state-of-the-art (SOTA) optical flow models use finite-step recurrent update operations to emulate traditional algorithms.
These RNNs impose large computation and memory overheads, and are not directly trained to model such stable estimation.
We propose deep equilibrium (DEQ) flow estimators, an approach that directly solves for the flow as the infinite-level fixed point of an implicit layer.
arXiv Detail & Related papers (2022-04-18T17:53:44Z) - Efficient Neural Network Training via Forward and Backward Propagation
Sparsification [26.301103403328312]
We propose an efficient sparse training method with completely sparse forward and backward passes.
Our algorithm is much more effective in accelerating the training process, up to an order of magnitude faster.
arXiv Detail & Related papers (2021-11-10T13:49:47Z) - Adapting Stepsizes by Momentumized Gradients Improves Optimization and
Generalization [89.66571637204012]
textscAdaMomentum on vision, and achieves state-the-art results consistently on other tasks including language processing.
textscAdaMomentum on vision, and achieves state-the-art results consistently on other tasks including language processing.
textscAdaMomentum on vision, and achieves state-the-art results consistently on other tasks including language processing.
arXiv Detail & Related papers (2021-06-22T03:13:23Z) - GradInit: Learning to Initialize Neural Networks for Stable and
Efficient Training [59.160154997555956]
We present GradInit, an automated and architecture method for initializing neural networks.
It is based on a simple agnostic; the variance of each network layer is adjusted so that a single step of SGD or Adam results in the smallest possible loss value.
It also enables training the original Post-LN Transformer for machine translation without learning rate warmup.
arXiv Detail & Related papers (2021-02-16T11:45:35Z) - Incremental Object Detection via Meta-Learning [77.55310507917012]
We propose a meta-learning approach that learns to reshape model gradients, such that information across incremental tasks is optimally shared.
In comparison to existing meta-learning methods, our approach is task-agnostic, allows incremental addition of new-classes and scales to high-capacity models for object detection.
arXiv Detail & Related papers (2020-03-17T13:40:00Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.