PPSEBM: An Energy-Based Model with Progressive Parameter Selection for Continual Learning
- URL: http://arxiv.org/abs/2512.15658v1
- Date: Wed, 17 Dec 2025 18:11:29 GMT
- Title: PPSEBM: An Energy-Based Model with Progressive Parameter Selection for Continual Learning
- Authors: Xiaodi Li, Dingcheng Li, Rujun Gao, Mahmoud Zamani, Feng Mi, Latifur Khan,
- Abstract summary: A major obstacle in machine learning is catastrophic forgetting, where performance on earlier tasks degrades as new tasks are learned.<n>In this paper, we introduce PPSEBM, a novel framework that integrates an Energy-Based Model (EBM) with Progressive Contingency Selection (PPS)<n>PPS allocates distinct, task-specific parameters for each new task, while the EBM generates representative pseudo-samples from prior tasks.<n> Experimental results on diverse NLP benchmarks demonstrate that PPSEBM outperforms state-of-the-art continual learning methods.
- Score: 12.099628640050554
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Continual learning remains a fundamental challenge in machine learning, requiring models to learn from a stream of tasks without forgetting previously acquired knowledge. A major obstacle in this setting is catastrophic forgetting, where performance on earlier tasks degrades as new tasks are learned. In this paper, we introduce PPSEBM, a novel framework that integrates an Energy-Based Model (EBM) with Progressive Parameter Selection (PPS) to effectively address catastrophic forgetting in continual learning for natural language processing tasks. In PPSEBM, progressive parameter selection allocates distinct, task-specific parameters for each new task, while the EBM generates representative pseudo-samples from prior tasks. These generated samples actively inform and guide the parameter selection process, enhancing the model's ability to retain past knowledge while adapting to new tasks. Experimental results on diverse NLP benchmarks demonstrate that PPSEBM outperforms state-of-the-art continual learning methods, offering a promising and robust solution to mitigate catastrophic forgetting.
Related papers
- Mixtures of SubExperts for Large Language Continual Learning [6.425296129700846]
Adapting Large Language Models to a continuous stream of tasks is a critical yet challenging endeavor.<n>Reusing a single set of PEFT parameters for new tasks often leads to catastrophic forgetting of prior knowledge.<n>We propose a novel adaptive PEFT method referred to as textitMixtures of SubExperts (MoSEs), a novel continual learning framework designed for minimal forgetting and efficient scalability.
arXiv Detail & Related papers (2025-11-09T05:44:45Z) - Neural Variational Dropout Processes [44.95055503650414]
This paper presents a new Bayesian meta-learning approach called Neural Variational Dropout Processes (NVDPs)<n>NVDPs model the conditional posterior distribution based on a task-specific dropout.<n>Surprisingly, this enables the robust approximation of task-specific dropout rates.
arXiv Detail & Related papers (2025-10-22T09:45:44Z) - Orthogonal Projection Subspace to Aggregate Online Prior-knowledge for Continual Test-time Adaptation [67.80294336559574]
Continual Test Time Adaptation (CTTA) is a task that requires a source pre-trained model to continually adapt to new scenarios.<n>We propose a novel pipeline, Orthogonal Projection Subspace to aggregate online Prior-knowledge, dubbed OoPk.
arXiv Detail & Related papers (2025-06-23T18:17:39Z) - Sculpting Subspaces: Constrained Full Fine-Tuning in LLMs for Continual Learning [19.27175827358111]
Continual learning in large language models (LLMs) is prone to catastrophic forgetting, where adapting to new tasks significantly degrades performance on previously learned ones.<n>We propose a novel continual full fine-tuning approach leveraging adaptive singular value decomposition (SVD)<n>We evaluate our approach extensively on standard continual learning benchmarks using both encoder-decoder (T5-Large) and decoder-only (LLaMA-2 7B) models.
arXiv Detail & Related papers (2025-04-09T17:59:42Z) - Model Predictive Task Sampling for Efficient and Robust Adaptation [57.414812940406996]
We introduce Model Predictive Task Sampling (MPTS), a framework that bridges the task space and adaptation risk distributions.<n>MPTS employs a generative model to characterize the episodic optimization process and predicts task-specific adaptation risk via posterior inference.<n>MPTS seamlessly integrates into zero-shot, few-shot, and supervised finetuning settings.
arXiv Detail & Related papers (2025-01-19T13:14:53Z) - LSEBMCL: A Latent Space Energy-Based Model for Continual Learning [20.356926275395004]
The study demonstrates the efficacy of EBM in NLP tasks, achieving state-of-the-art results in all experiments.<n>The proposed solution LSEBMCL (Latent Space Energy-Based Model for Continual Learning) in this work is to use energy-based models (EBMs) to prevent catastrophic forgetting.
arXiv Detail & Related papers (2025-01-09T15:47:30Z) - Towards Robust Continual Learning with Bayesian Adaptive Moment Regularization [51.34904967046097]
Continual learning seeks to overcome the challenge of catastrophic forgetting, where a model forgets previously learnt information.
We introduce a novel prior-based method that better constrains parameter growth, reducing catastrophic forgetting.
Results show that BAdam achieves state-of-the-art performance for prior-based methods on challenging single-headed class-incremental experiments.
arXiv Detail & Related papers (2023-09-15T17:10:51Z) - Improving Meta-learning for Low-resource Text Classification and
Generation via Memory Imitation [87.98063273826702]
We propose a memory imitation meta-learning (MemIML) method that enhances the model's reliance on support sets for task adaptation.
A theoretical analysis is provided to prove the effectiveness of our method.
arXiv Detail & Related papers (2022-03-22T12:41:55Z) - SUPERB-SG: Enhanced Speech processing Universal PERformance Benchmark
for Semantic and Generative Capabilities [76.97949110580703]
We introduce SUPERB-SG, a new benchmark to evaluate pre-trained models across various speech tasks.
We use a lightweight methodology to test the robustness of representations learned by pre-trained models under shifts in data domain.
We also show that the task diversity of SUPERB-SG coupled with limited task supervision is an effective recipe for evaluating the generalizability of model representation.
arXiv Detail & Related papers (2022-03-14T04:26:40Z) - Continual Learning via Bit-Level Information Preserving [88.32450740325005]
We study the continual learning process through the lens of information theory.
We propose Bit-Level Information Preserving (BLIP) that preserves the information gain on model parameters.
BLIP achieves close to zero forgetting while only requiring constant memory overheads throughout continual learning.
arXiv Detail & Related papers (2021-05-10T15:09:01Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.