Related papers: Reflecting on the State of Rehearsal-free Continual Learning with Pretrained Models

Reflecting on the State of Rehearsal-free Continual Learning with Pretrained Models

URL: http://arxiv.org/abs/2406.09384v1
Date: Thu, 13 Jun 2024 17:57:10 GMT
Title: Reflecting on the State of Rehearsal-free Continual Learning with Pretrained Models
Authors: Lukas Thede, Karsten Roth, Olivier J. Hénaff, Matthias Bethge, Zeynep Akata,
Abstract summary: We show how P-RFCL techniques can be matched by a simple and lightweight PEFT baseline. We show how most often, P-RFCL techniques can be matched by a simple and lightweight PEFT baseline.
Score: 63.11967672725459
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: With the advent and recent ubiquity of foundation models, continual learning (CL) has recently shifted from continual training from scratch to the continual adaptation of pretrained models, seeing particular success on rehearsal-free CL benchmarks (RFCL). To achieve this, most proposed methods adapt and restructure parameter-efficient finetuning techniques (PEFT) to suit the continual nature of the problem. Based most often on input-conditional query-mechanisms or regularizations on top of prompt- or adapter-based PEFT, these PEFT-style RFCL (P-RFCL) approaches report peak performances; often convincingly outperforming existing CL techniques. However, on the other end, critical studies have recently highlighted competitive results by training on just the first task or via simple non-parametric baselines. Consequently, questions arise about the relationship between methodological choices in P-RFCL and their reported high benchmark scores. In this work, we tackle these questions to better understand the true drivers behind strong P-RFCL performances, their placement w.r.t. recent first-task adaptation studies, and their relation to preceding CL standards such as EWC or SI. In particular, we show: (1) P-RFCL techniques relying on input-conditional query mechanisms work not because, but rather despite them by collapsing towards standard PEFT shortcut solutions. (2) Indeed, we show how most often, P-RFCL techniques can be matched by a simple and lightweight PEFT baseline. (3) Using this baseline, we identify the implicit bound on tunable parameters when deriving RFCL approaches from PEFT methods as a potential denominator behind P-RFCL efficacy. Finally, we (4) better disentangle continual versus first-task adaptation, and (5) motivate standard RFCL techniques s.a. EWC or SI in light of recent P-RFCL methods.

Related papers

Revisiting Weight Regularization for Low-Rank Continual Learning [42.550292504567935]
Continual Learning with large-scale pre-trained models (PTMs) has recently gained wide attention.<n> task interference is typically mitigated by assigning a task-specific module during training, such as low-rank adapters.<n>Weight regularization techniques, such as Elastic Weight Consolidation (EWC)-a key strategy in CL-remain underexplored in this new paradigm.
arXiv Detail & Related papers (2026-02-19T17:13:00Z)
Stabilizing Reinforcement Learning with LLMs: Formulation and Practices [61.361819972410046]
We show why and under what conditions the true sequence-level reward can be optimized via a surrogate token-level objective in policy gradient methods such as REINFORCE.<n>This insight provides a principled explanation for the crucial role of several widely adopted techniques in stabilizing RL training.
arXiv Detail & Related papers (2025-12-01T07:45:39Z)
Reinforcement Fine-Tuning Naturally Mitigates Forgetting in Continual Post-Training [23.99424961055015]
This paper presents a comparative analysis of two core post-training paradigms: supervised fine-tuning (SFT) and reinforcement fine-tuning (RFT)<n>Our experiments are conducted on a benchmark comprising seven diverse multimodal tasks.
arXiv Detail & Related papers (2025-07-07T18:17:06Z)
Adapt before Continual Learning [9.477667054965782]
Adapting PTMs before the core CL process (ACL) is a novel framework that introduces a plug-and-play adaptation phase prior to learning each new task.<n>ACL significantly improves CL performance across benchmarks and integrated methods.
arXiv Detail & Related papers (2025-06-04T13:46:33Z)
A Minimalist Approach to LLM Reasoning: from Rejection Sampling to Reinforce [68.99924691391048]
We revisit GRPO from a reinforce-like algorithm perspective and analyze its core components. We find that a simple rejection sampling baseline, RAFT, yields competitive performance than GRPO and PPO. Motivated by this insight, we propose Reinforce-Rej, a minimal extension of policy gradient that filters both entirely incorrect and entirely correct samples.
arXiv Detail & Related papers (2025-04-15T16:15:02Z)
Advancing Prompt-Based Methods for Replay-Independent General Continual Learning [44.94466949172424]
General continual learning (GCL) is a broad concept to describe real-world continual learning (CL) problems. Such requirements result in poor initial performance, limited generalizability, and severe catastrophic forgetting. We propose an innovative approach named MISA (Mask and Initial Session Adaption) to advance prompt-based methods in GCL.
arXiv Detail & Related papers (2025-03-02T00:58:18Z)
Fishing For Cheap And Efficient Pruners At Initialization [4.433137726540548]
Pruning offers a promising solution to mitigate the associated costs and environmental impact of deploying large deep neural networks (DNNs) We introduce Fisher-Taylor Sensitivity (FTS), a computationally cheap and efficient pruning criterion based on the empirical Fisher Information Matrix (FIM) diagonal. Our method achieves competitive performance against state-of-the-art techniques for one-shot PBT, even under extreme sparsity conditions.
arXiv Detail & Related papers (2025-02-17T05:22:23Z)
Replay-Free Continual Low-Rank Adaptation with Dynamic Memory [62.85596937435928]
We revisit continual learning, which enables pre-trained vision transformers (ViTs) to sequentially fine-tune on new downstream tasks over time.<n>Recent studies highlight a crossover between CL techniques and parameter-efficient fine-tuning.<n>We propose a novel PEFT-CL method called Dual Low-Rank Adaptation (DualLoRA)
arXiv Detail & Related papers (2024-11-01T14:28:39Z)
Meta-Learning Adaptable Foundation Models [37.458141335750696]
We introduce a meta-learning framework infused with PEFT in this intermediate retraining stage to learn a model that can be easily adapted to unseen tasks. In this setting, we demonstrate the suboptimality of standard retraining for finding an adaptable set of parameters. We then apply these theoretical insights to retraining the RoBERTa model to predict the continuation of conversations within the ConvAI2 dataset.
arXiv Detail & Related papers (2024-10-29T17:24:18Z)
ICL-TSVD: Bridging Theory and Practice in Continual Learning with Pre-trained Models [103.45785408116146]
Continual learning (CL) aims to train a model that can solve multiple tasks presented sequentially. Recent CL approaches have achieved strong performance by leveraging large pre-trained models that generalize well to downstream tasks. However, such methods lack theoretical guarantees, making them prone to unexpected failures. We bridge this gap by integrating an empirically strong approach into a principled framework, designed to prevent forgetting.
arXiv Detail & Related papers (2024-10-01T12:58:37Z)
SLCA++: Unleash the Power of Sequential Fine-tuning for Continual Learning with Pre-training [68.7896349660824]
We present an in-depth analysis of the progressive overfitting problem from the lens of Seq FT. Considering that the overly fast representation learning and the biased classification layer constitute this particular problem, we introduce the advanced Slow Learner with Alignment (S++) framework. Our approach involves a Slow Learner to selectively reduce the learning rate of backbone parameters, and a Alignment to align the disjoint classification layers in a post-hoc fashion.
arXiv Detail & Related papers (2024-08-15T17:50:07Z)
HiDe-PET: Continual Learning via Hierarchical Decomposition of Parameter-Efficient Tuning [55.88910947643436]
We propose a unified framework for continual learning (CL) with pre-trained models (PTMs) and parameter-efficient tuning (PET) We present Hierarchical Decomposition PET (HiDe-PET), an innovative approach that explicitly optimize the objective through incorporating task-specific and task-shared knowledge. Our approach demonstrates remarkably superior performance over a broad spectrum of recent strong baselines.
arXiv Detail & Related papers (2024-07-07T01:50:25Z)
Choice of PEFT Technique in Continual Learning: Prompt Tuning is Not All You Need [18.112632827740878]
We find that the choice of prompt tuning as a PEFT method hurts the overall performance of the CL system. We replace prompt tuning with LoRA in two state-of-the-art continual learning methods: Learning to Prompt and S-Prompts.
arXiv Detail & Related papers (2024-06-05T12:53:37Z)
Iterative Preference Learning from Human Feedback: Bridging Theory and Practice for RLHF under KL-Constraint [56.74058752955209]
This paper studies the alignment process of generative models with Reinforcement Learning from Human Feedback (RLHF) We first identify the primary challenges of existing popular methods like offline PPO and offline DPO as lacking in strategical exploration of the environment. We propose efficient algorithms with finite-sample theoretical guarantees.
arXiv Detail & Related papers (2023-12-18T18:58:42Z)
Test-Time Training for Semantic Segmentation with Output Contrastive Loss [12.535720010867538]
Deep learning-based segmentation models have achieved impressive performance on public benchmarks, but generalizing well to unseen environments remains a major challenge. This paper introduces Contrastive Loss (OCL), known for its capability to learn robust and generalized representations, to stabilize the adaptation process. Our method excels even when applied to models initially pre-trained using domain adaptation methods on test domain data, showcasing its resilience and adaptability.
arXiv Detail & Related papers (2023-11-14T03:13:47Z)
Can Continual Learning Improve Long-Tailed Recognition? Toward a Unified Framework [16.457778420360537]
Long-Tailed Recognition methods aim to accurately learn a dataset comprising both a larger Head set and a smaller Tail set. We show that Continual Learning (CL) methods can effectively update the weights of the learner to learn the Tail without forgetting the Head. We also assess the applicability of CL techniques on real-world data by exploring CL on the naturally imbalanced256 dataset.
arXiv Detail & Related papers (2023-06-23T03:05:33Z)
Strong Baselines for Parameter Efficient Few-Shot Fine-tuning [50.83426196335385]
Few-shot classification (FSC) entails learning novel classes given only a few examples per class after a pre-training (or meta-training) phase. Recent works have shown that simply fine-tuning a pre-trained Vision Transformer (ViT) on new test classes is a strong approach for FSC. Fine-tuning ViTs, however, is expensive in time, compute and storage. This has motivated the design of parameter efficient fine-tuning (PEFT) methods which fine-tune only a fraction of the Transformer's parameters.
arXiv Detail & Related papers (2023-04-04T16:14:39Z)

This list is automatically generated from the titles and abstracts of the papers in this site.