Improving information retention in large scale online continual learning
- URL: http://arxiv.org/abs/2210.06401v1
- Date: Wed, 12 Oct 2022 16:59:43 GMT
- Title: Improving information retention in large scale online continual learning
- Authors: Zhipeng Cai and Vladlen Koltun and Ozan Sener
- Abstract summary: Online continual learning aims to adapt efficiently to new data while retaining existing knowledge.
Recent work suggests that information retention remains a problem in large scale OCL even when the replay buffer is unlimited.
We propose using a moving average family of methods to improve optimization for non-stationary objectives.
- Score: 99.73847522194549
- License: http://creativecommons.org/licenses/by-nc-nd/4.0/
- Abstract: Given a stream of data sampled from non-stationary distributions, online
continual learning (OCL) aims to adapt efficiently to new data while retaining
existing knowledge. The typical approach to address information retention (the
ability to retain previous knowledge) is keeping a replay buffer of a fixed
size and computing gradients using a mixture of new data and the replay buffer.
Surprisingly, the recent work (Cai et al., 2021) suggests that information
retention remains a problem in large scale OCL even when the replay buffer is
unlimited, i.e., the gradients are computed using all past data. This paper
focuses on this peculiarity to understand and address information retention. To
pinpoint the source of this problem, we theoretically show that, given limited
computation budgets at each time step, even without strict storage limit,
naively applying SGD with constant or constantly decreasing learning rates
fails to optimize information retention in the long term. We propose using a
moving average family of methods to improve optimization for non-stationary
objectives. Specifically, we design an adaptive moving average (AMA) optimizer
and a moving-average-based learning rate schedule (MALR). We demonstrate the
effectiveness of AMA+MALR on large-scale benchmarks, including Continual
Localization (CLOC), Google Landmarks, and ImageNet. Code will be released upon
publication.
Related papers
- Buffer-based Gradient Projection for Continual Federated Learning [16.879024856283323]
Fed-A-GEM mitigates catastrophic forgetting by leveraging local buffer samples and aggregated buffer gradients.
Our experiments on standard benchmarks show consistent performance improvements across diverse scenarios.
arXiv Detail & Related papers (2024-09-03T03:50:19Z) - Adaptive Rentention & Correction for Continual Learning [114.5656325514408]
A common problem in continual learning is the classification layer's bias towards the most recent task.
We name our approach Adaptive Retention & Correction (ARC)
ARC achieves an average performance increase of 2.7% and 2.6% on the CIFAR-100 and Imagenet-R datasets.
arXiv Detail & Related papers (2024-05-23T08:43:09Z) - Adaptive Memory Replay for Continual Learning [29.333341368722653]
Updating Foundation Models as new data becomes available can lead to catastrophic forgetting'
We introduce a framework of adaptive memory replay for continual learning, where sampling of past data is phrased as a multi-armed bandit problem.
We demonstrate the effectiveness of our approach, which maintains high performance while reducing forgetting by up to 10% at no training efficiency cost.
arXiv Detail & Related papers (2024-04-18T22:01:56Z) - Enhancing Consistency and Mitigating Bias: A Data Replay Approach for
Incremental Learning [100.7407460674153]
Deep learning systems are prone to catastrophic forgetting when learning from a sequence of tasks.
To mitigate the problem, a line of methods propose to replay the data of experienced tasks when learning new tasks.
However, it is not expected in practice considering the memory constraint or data privacy issue.
As a replacement, data-free data replay methods are proposed by inverting samples from the classification model.
arXiv Detail & Related papers (2024-01-12T12:51:12Z) - Age-Based Scheduling for Mobile Edge Computing: A Deep Reinforcement
Learning Approach [58.911515417156174]
We propose a new definition of Age of Information (AoI) and, based on the redefined AoI, we formulate an online AoI problem for MEC systems.
We introduce Post-Decision States (PDSs) to exploit the partial knowledge of the system's dynamics.
We also combine PDSs with deep RL to further improve the algorithm's applicability, scalability, and robustness.
arXiv Detail & Related papers (2023-12-01T01:30:49Z) - Rapid Adaptation in Online Continual Learning: Are We Evaluating It
Right? [135.71855998537347]
We revisit the common practice of evaluating adaptation of Online Continual Learning (OCL) algorithms through the metric of online accuracy.
We show that this metric is unreliable, as even vacuous blind classifiers can achieve unrealistically high online accuracy.
Existing OCL algorithms can also achieve high online accuracy, but perform poorly in retaining useful information.
arXiv Detail & Related papers (2023-05-16T08:29:33Z) - Online Continual Learning Without the Storage Constraint [67.66235695269839]
We contribute a simple algorithm, which updates a kNN classifier continually along with a fixed, pretrained feature extractor.
It can adapt to rapidly changing streams, has zero stability gap, operates within tiny computational budgets, has low storage requirements by only storing features.
It can outperform existing methods by over 20% in accuracy on two large-scale online continual learning datasets.
arXiv Detail & Related papers (2023-05-16T08:03:07Z) - Efficient Bayesian Updates for Deep Learning via Laplace Approximations [1.5996841879821277]
We propose a novel Bayesian update method for deep neural networks.
We leverage second-order optimization techniques on the Gaussian posterior distribution of a Laplace approximation.
A large-scale evaluation study confirms that our updates are a fast and competitive alternative to costly retraining.
arXiv Detail & Related papers (2022-10-12T12:16:46Z) - Federated Continual Learning through distillation in pervasive computing [0.2519906683279153]
Federated Learning has been introduced as a new machine learning paradigm enhancing the use of local devices.
Current solutions rely on the availability of large amounts of stored data at the client side in order to fine-tune the models sent by the server.
This proposal has been evaluated in the Human Activity Recognition (HAR) domain and has shown to effectively reduce the catastrophic forgetting effect.
arXiv Detail & Related papers (2022-07-17T13:55:20Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.