Related papers: Budgeted Online Continual Learning by Adaptive Layer Freezing and Frequency-based Sampling

Budgeted Online Continual Learning by Adaptive Layer Freezing and Frequency-based Sampling

URL: http://arxiv.org/abs/2410.15143v1
Date: Sat, 19 Oct 2024 16:00:00 GMT
Title: Budgeted Online Continual Learning by Adaptive Layer Freezing and Frequency-based Sampling
Authors: Minhyuk Seo, Hyunseo Koh, Jonghyun Choi,
Abstract summary: We propose to use floating point operations and total memory size in Byte as a metric for computational and memory budgets. To improve a CL method in a limited total budget, we propose adaptive layer freezing that does not update the layers for less informative batches. In addition, we propose a memory retrieval method that allows the model to learn the same amount of knowledge as using random retrieval in fewer iterations.
Score: 19.447914903112366
License: http://creativecommons.org/licenses/by-nc-sa/4.0/
Abstract: The majority of online continual learning (CL) advocates single-epoch training and imposes restrictions on the size of replay memory. However, single-epoch training would incur a different amount of computations per CL algorithm, and the additional storage cost to store logit or model in addition to replay memory is largely ignored in calculating the storage budget. Arguing different computational and storage budgets hinder fair comparison among CL algorithms in practice, we propose to use floating point operations (FLOPs) and total memory size in Byte as a metric for computational and memory budgets, respectively, to compare and develop CL algorithms in the same 'total resource budget.' To improve a CL method in a limited total budget, we propose adaptive layer freezing that does not update the layers for less informative batches to reduce computational costs with a negligible loss of accuracy. In addition, we propose a memory retrieval method that allows the model to learn the same amount of knowledge as using random retrieval in fewer iterations. Empirical validations on the CIFAR-10/100, CLEAR-10/100, and ImageNet-1K datasets demonstrate that the proposed approach outperforms the state-of-the-art methods within the same total budget

Related papers

Cost-Efficient Continual Learning with Sufficient Exemplar Memory [55.77835198580209]
Continual learning (CL) research typically assumes highly constrained exemplar memory resources. In this work, we investigate CL in a novel setting where exemplar memory is ample. Our method achieves state-of-the-art performance while reducing the computational cost to a quarter or third of existing methods.
arXiv Detail & Related papers (2025-02-11T05:40:52Z)
Breaking the Memory Barrier: Near Infinite Batch Size Scaling for Contrastive Loss [59.835032408496545]
We propose a tile-based strategy that partitions the contrastive loss calculation into arbitrary small blocks. We also introduce a multi-level tiling strategy to leverage the hierarchical structure of distributed systems. Compared to SOTA memory-efficient solutions, it achieves a two-order-of-magnitude reduction in memory while maintaining comparable speed.
arXiv Detail & Related papers (2024-10-22T17:59:30Z)
An Efficient Procedure for Computing Bayesian Network Structure Learning [0.9208007322096532]
We propose a globally optimal Bayesian network structure discovery algorithm based on a progressively leveled scoring approach. Experimental results indicate that our method, when using only memory, not only reduces peak memory usage but also improves computational efficiency.
arXiv Detail & Related papers (2024-07-24T07:59:18Z)
Model Tells You Where to Merge: Adaptive KV Cache Merging for LLMs on Long-Context Tasks [21.815661269986425]
We propose a novel KV cache merging approach, called KVMerger, to achieve adaptive KV cache compression for long-context tasks. Our approach is inspired by the intriguing observation that key states exhibit high similarity at the token level within a single sequence. We conduct extensive experiments to demonstrate the effectiveness of KVMerger for long-context tasks under constrained memory budgets.
arXiv Detail & Related papers (2024-07-11T12:50:42Z)
Continual Learning on a Diet: Learning from Sparsely Labeled Streams Under Constrained Computation [123.4883806344334]
We study a realistic Continual Learning setting where learning algorithms are granted a restricted computational budget per time step while training. We apply this setting to large-scale semi-supervised Continual Learning scenarios with sparse label rates. Our extensive analysis and ablations demonstrate that DietCL is stable under a full spectrum of label sparsity, computational budget, and various other ablations.
arXiv Detail & Related papers (2024-04-19T10:10:39Z)
Online Continual Learning Without the Storage Constraint [67.66235695269839]
We contribute a simple algorithm, which updates a kNN classifier continually along with a fixed, pretrained feature extractor. It can adapt to rapidly changing streams, has zero stability gap, operates within tiny computational budgets, has low storage requirements by only storing features. It can outperform existing methods by over 20% in accuracy on two large-scale online continual learning datasets.
arXiv Detail & Related papers (2023-05-16T08:03:07Z)
Computationally Budgeted Continual Learning: What Does Matter? [128.0827987414154]
Continual Learning (CL) aims to sequentially train models on streams of incoming data that vary in distribution by preserving previous knowledge while adapting to new data. Current CL literature focuses on restricted access to previously seen data, while imposing no constraints on the computational budget for training. We revisit this problem with a large-scale benchmark and analyze the performance of traditional CL approaches in a compute-constrained setting.
arXiv Detail & Related papers (2023-03-20T14:50:27Z)
Improving information retention in large scale online continual learning [99.73847522194549]
Online continual learning aims to adapt efficiently to new data while retaining existing knowledge. Recent work suggests that information retention remains a problem in large scale OCL even when the replay buffer is unlimited. We propose using a moving average family of methods to improve optimization for non-stationary objectives.
arXiv Detail & Related papers (2022-10-12T16:59:43Z)
SreaMRAK a Streaming Multi-Resolution Adaptive Kernel Algorithm [60.61943386819384]
Existing implementations of KRR require that all the data is stored in the main memory. We propose StreaMRAK - a streaming version of KRR. We present a showcase study on two synthetic problems and the prediction of the trajectory of a double pendulum.
arXiv Detail & Related papers (2021-08-23T21:03:09Z)

This list is automatically generated from the titles and abstracts of the papers in this site.