Online Prototypes and Class-Wise Hypergradients for Online Continual Learning with Pre-Trained Models
- URL: http://arxiv.org/abs/2502.18762v1
- Date: Wed, 26 Feb 2025 02:43:54 GMT
- Title: Online Prototypes and Class-Wise Hypergradients for Online Continual Learning with Pre-Trained Models
- Authors: Nicolas Michel, Maorong Wang, Jiangpeng He, Toshihiko Yamasaki,
- Abstract summary: Continual Learning (CL) addresses the problem of learning from a data sequence where the distribution changes over time.<n>In this paper, we tackle both problems by leveraging Online Prototypes (OP) and Class-Wise Hypergradients (CWH)<n>OP leverages stable output representations of PTM by updating its value on the fly to act as replay samples without requiring task boundaries.<n>CWH learns class-dependent gradient coefficients during training to improve over sub-optimal learning rates.
- Score: 24.963242232471426
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Continual Learning (CL) addresses the problem of learning from a data sequence where the distribution changes over time. Recently, efficient solutions leveraging Pre-Trained Models (PTM) have been widely explored in the offline CL (offCL) scenario, where the data corresponding to each incremental task is known beforehand and can be seen multiple times. However, such solutions often rely on 1) prior knowledge regarding task changes and 2) hyper-parameter search, particularly regarding the learning rate. Both assumptions remain unavailable in online CL (onCL) scenarios, where incoming data distribution is unknown and the model can observe each datum only once. Therefore, existing offCL strategies fall largely behind performance-wise in onCL, with some proving difficult or impossible to adapt to the online scenario. In this paper, we tackle both problems by leveraging Online Prototypes (OP) and Class-Wise Hypergradients (CWH). OP leverages stable output representations of PTM by updating its value on the fly to act as replay samples without requiring task boundaries or storing past data. CWH learns class-dependent gradient coefficients during training to improve over sub-optimal learning rates. We show through experiments that both introduced strategies allow for a consistent gain in accuracy when integrated with existing approaches. We will make the code fully available upon acceptance.
Related papers
- PROL : Rehearsal Free Continual Learning in Streaming Data via Prompt Online Learning [17.230781041043823]
We propose a novel prompt-based method for online continual learning (OCL) that includes 4 main components.<n>Our proposed method achieves significantly higher performance than the current SOTAs in CIFAR100, ImageNet-R, ImageNet-A, and CUB dataset.
arXiv Detail & Related papers (2025-07-16T15:04:46Z) - Advancing Prompt-Based Methods for Replay-Independent General Continual Learning [44.94466949172424]
General continual learning (GCL) is a broad concept to describe real-world continual learning (CL) problems.<n>Such requirements result in poor initial performance, limited generalizability, and severe catastrophic forgetting.<n>We propose an innovative approach named MISA (Mask and Initial Session Adaption) to advance prompt-based methods in GCL.
arXiv Detail & Related papers (2025-03-02T00:58:18Z) - Memory Is Not the Bottleneck: Cost-Efficient Continual Learning via Weight Space Consolidation [55.77835198580209]
Continual learning (CL) has traditionally emphasized minimizing exemplar memory usage, assuming that memory is the primary bottleneck.<n>This paper re-examines CL under a more realistic setting with sufficient exemplar memory, where the system can retain a representative portion of past data.<n>We find that, under this regime, stability improves due to reduced forgetting, but plasticity diminishes as the model becomes biased toward prior tasks and struggles to adapt to new ones.
arXiv Detail & Related papers (2025-02-11T05:40:52Z) - Online Curvature-Aware Replay: Leveraging $\mathbf{2^{nd}}$ Order Information for Online Continual Learning [2.0165668334347187]
We formalize replay-based online joint optimization with explicit KL-divergence constraints on replay data.<n>We show how to adapt the estimation of the FIM to a continual setting second-order optimization for non-iid data.<n>OCAR outperforms state-of-the-art methods in continual metrics achieving higher average accuracy throughout the training process in three different benchmarks.
arXiv Detail & Related papers (2025-02-03T22:31:36Z) - ICL-TSVD: Bridging Theory and Practice in Continual Learning with Pre-trained Models [103.45785408116146]
Continual learning (CL) aims to train a model that can solve multiple tasks presented sequentially.
Recent CL approaches have achieved strong performance by leveraging large pre-trained models that generalize well to downstream tasks.
However, such methods lack theoretical guarantees, making them prone to unexpected failures.
We bridge this gap by integrating an empirically strong approach into a principled framework, designed to prevent forgetting.
arXiv Detail & Related papers (2024-10-01T12:58:37Z) - Beyond Prompt Learning: Continual Adapter for Efficient Rehearsal-Free Continual Learning [22.13331870720021]
We propose a beyond prompt learning approach to the RFCL task, called Continual Adapter (C-ADA)
C-ADA flexibly extends specific weights in CAL to learn new knowledge for each task and freezes old weights to preserve prior knowledge.
Our approach achieves significantly improved performance and training speed, outperforming the current state-of-the-art (SOTA) method.
arXiv Detail & Related papers (2024-07-14T17:40:40Z) - SHERL: Synthesizing High Accuracy and Efficient Memory for Resource-Limited Transfer Learning [63.93193829913252]
We propose an innovative METL strategy called SHERL for resource-limited scenarios.
In the early route, intermediate outputs are consolidated via an anti-redundancy operation.
In the late route, utilizing minimal late pre-trained layers could alleviate the peak demand on memory overhead.
arXiv Detail & Related papers (2024-07-10T10:22:35Z) - Adaptive Retention & Correction: Test-Time Training for Continual Learning [114.5656325514408]
A common problem in continual learning is the classification layer's bias towards the most recent task.<n>We name our approach Adaptive Retention & Correction (ARC)<n>ARC achieves an average performance increase of 2.7% and 2.6% on the CIFAR-100 and Imagenet-R datasets.
arXiv Detail & Related papers (2024-05-23T08:43:09Z) - Detecting Morphing Attacks via Continual Incremental Training [10.796380524798744]
Recent Continual Learning (CL) paradigm may represent an effective solution to enable incremental training, even through multiple sites.
We investigate the performance of different Continual Learning methods in this scenario, simulating a learning model that is updated every time a new chunk of data, even of variable size, is available.
Experimental results reveal that a particular CL method, namely Learning without Forgetting (LwF), is one of the best-performing algorithms.
arXiv Detail & Related papers (2023-07-27T17:48:29Z) - On the Effectiveness of Equivariant Regularization for Robust Online
Continual Learning [17.995662644298974]
Continual Learning (CL) approaches seek to bridge this gap by facilitating the transfer of knowledge to both previous tasks and future ones.
Recent research has shown that self-supervision can produce versatile models that can generalize well to diverse downstream tasks.
We propose Continual Learning via Equivariant Regularization (CLER), an OCL approach that leverages equivariant tasks for self-supervision.
arXiv Detail & Related papers (2023-05-05T16:10:31Z) - CLIPood: Generalizing CLIP to Out-of-Distributions [73.86353105017076]
Contrastive language-image pre-training (CLIP) models have shown impressive zero-shot ability, but the further adaptation of CLIP on downstream tasks undesirably degrades OOD performances.
We propose CLIPood, a fine-tuning method that can adapt CLIP models to OOD situations where both domain shifts and open classes may occur on unseen test data.
Experiments on diverse datasets with different OOD scenarios show that CLIPood consistently outperforms existing generalization techniques.
arXiv Detail & Related papers (2023-02-02T04:27:54Z) - Boosting Offline Reinforcement Learning via Data Rebalancing [104.3767045977716]
offline reinforcement learning (RL) is challenged by the distributional shift between learning policies and datasets.
We propose a simple yet effective method to boost offline RL algorithms based on the observation that resampling a dataset keeps the distribution support unchanged.
We dub our method ReD (Return-based Data Rebalance), which can be implemented with less than 10 lines of code change and adds negligible running time.
arXiv Detail & Related papers (2022-10-17T16:34:01Z) - Improving information retention in large scale online continual learning [99.73847522194549]
Online continual learning aims to adapt efficiently to new data while retaining existing knowledge.
Recent work suggests that information retention remains a problem in large scale OCL even when the replay buffer is unlimited.
We propose using a moving average family of methods to improve optimization for non-stationary objectives.
arXiv Detail & Related papers (2022-10-12T16:59:43Z) - CAFA: Class-Aware Feature Alignment for Test-Time Adaptation [50.26963784271912]
Test-time adaptation (TTA) aims to address this challenge by adapting a model to unlabeled data at test time.
We propose a simple yet effective feature alignment loss, termed as Class-Aware Feature Alignment (CAFA), which simultaneously encourages a model to learn target representations in a class-discriminative manner.
arXiv Detail & Related papers (2022-06-01T03:02:07Z) - CMW-Net: Learning a Class-Aware Sample Weighting Mapping for Robust Deep
Learning [55.733193075728096]
Modern deep neural networks can easily overfit to biased training data containing corrupted labels or class imbalance.
Sample re-weighting methods are popularly used to alleviate this data bias issue.
We propose a meta-model capable of adaptively learning an explicit weighting scheme directly from data.
arXiv Detail & Related papers (2022-02-11T13:49:51Z) - vCLIMB: A Novel Video Class Incremental Learning Benchmark [53.90485760679411]
We introduce vCLIMB, a novel video continual learning benchmark.
vCLIMB is a standardized test-bed to analyze catastrophic forgetting of deep models in video continual learning.
We propose a temporal consistency regularization that can be applied on top of memory-based continual learning methods.
arXiv Detail & Related papers (2022-01-23T22:14:17Z) - Task-agnostic Continual Learning with Hybrid Probabilistic Models [75.01205414507243]
We propose HCL, a Hybrid generative-discriminative approach to Continual Learning for classification.
The flow is used to learn the data distribution, perform classification, identify task changes, and avoid forgetting.
We demonstrate the strong performance of HCL on a range of continual learning benchmarks such as split-MNIST, split-CIFAR, and SVHN-MNIST.
arXiv Detail & Related papers (2021-06-24T05:19:26Z) - Fast Class-wise Updating for Online Hashing [196.14748396106955]
This paper presents a novel supervised online hashing scheme, termed Fast Class-wise Updating for Online Hashing (FCOH)
A class-wise updating method is developed to decompose the binary code learning and alternatively renew the hash functions in a class-wise fashion, which well addresses the burden on large amounts of training batches.
To further achieve online efficiency, we propose a semi-relaxation optimization, which accelerates the online training by treating different binary constraints independently.
arXiv Detail & Related papers (2020-12-01T07:41:54Z) - Generalized Variational Continual Learning [33.194866396158005]
Two main approaches to continuous learning are Online Elastic Weight Consolidation and Variational Continual Learning.
We show that applying this modification to mitigate Online EWC as a limiting case, allowing baselines between the two approaches.
In order to the observed overpruning effect of VI, we take inspiration from a common multi-task architecture, mitigate neural networks with task-specific FiLM layers.
arXiv Detail & Related papers (2020-11-24T19:07:39Z) - Online Continual Learning under Extreme Memory Constraints [40.80045285324969]
We introduce the novel problem of Memory-Constrained Online Continual Learning (MC-OCL)
MC-OCL imposes strict constraints on the memory overhead that a possible algorithm can use to avoid catastrophic forgetting.
We propose an algorithmic solution to MC-OCL: Batch-level Distillation (BLD), a regularization-based CL approach.
arXiv Detail & Related papers (2020-08-04T13:25:26Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.