Related papers: Weighted Ensemble Models Are Strong Continual Learners

Weighted Ensemble Models Are Strong Continual Learners

URL: http://arxiv.org/abs/2312.08977v2
Date: Thu, 21 Mar 2024 04:04:25 GMT
Title: Weighted Ensemble Models Are Strong Continual Learners
Authors: Imad Eddine Marouf, Subhankar Roy, Enzo Tartaglione, Stéphane Lathuilière,
Abstract summary: We study the problem of continual learning (CL) where the goal is to learn a model on a sequence of tasks. CL is essentially a balancing act between being able to learn on the new task and maintaining the performance on the previously learned concepts. Intending to address the stability-plasticity trade-off, we propose to perform weight-ensembling of the model parameters of the previous and current tasks.
Score: 20.62749699589017
License: http://creativecommons.org/licenses/by/4.0/
Abstract: In this work, we study the problem of continual learning (CL) where the goal is to learn a model on a sequence of tasks, such that the data from the previous tasks becomes unavailable while learning on the current task data. CL is essentially a balancing act between being able to learn on the new task (i.e., plasticity) and maintaining the performance on the previously learned concepts (i.e., stability). Intending to address the stability-plasticity trade-off, we propose to perform weight-ensembling of the model parameters of the previous and current tasks. This weighted-ensembled model, which we call Continual Model Averaging (or CoMA), attains high accuracy on the current task by leveraging plasticity, while not deviating too far from the previous weight configuration, ensuring stability. We also propose an improved variant of CoMA, named Continual Fisher-weighted Model Averaging (or CoFiMA), that selectively weighs each parameter in the weights ensemble by leveraging the Fisher information of the weights of the model. Both variants are conceptually simple, easy to implement, and effective in attaining state-of-the-art performance on several standard CL benchmarks. Code is available at: https://github.com/IemProg/CoFiMA.

Related papers

Continual Learning in Vision-Language Models via Aligned Model Merging [84.47520899851557]
We present a new perspective based on model merging to maintain stability while still retaining plasticity.<n>To maximize the effectiveness of the merging process, we propose a simple mechanism that promotes learning aligned weights with previous ones.
arXiv Detail & Related papers (2025-05-30T20:52:21Z)
DATA: Decomposed Attention-based Task Adaptation for Rehearsal-Free Continual Learning [22.386864304549285]
Continual learning (CL) is essential for Large Language Models (LLMs) to adapt to evolving real-world demands. Recent rehearsal-free methods employ model-based and regularization-based strategies to address this issue. We propose a $textbfD$e $textbfA$ttention-based $textbfTask $textbfA$daptation ( DATA) DATA explicitly decouples and learns both task-specific and task-shared knowledge using high-rank and low-rank task adapters.
arXiv Detail & Related papers (2025-02-17T06:35:42Z)
Merging Models on the Fly Without Retraining: A Sequential Approach to Scalable Continual Model Merging [75.93960998357812]
Deep model merging represents an emerging research direction that combines multiple fine-tuned models to harness their capabilities across different tasks and domains. Current model merging techniques focus on merging all available models simultaneously, with weight matrices-based methods being the predominant approaches. We propose a training-free projection-based continual merging method that processes models sequentially.
arXiv Detail & Related papers (2025-01-16T13:17:24Z)
LLaCA: Multimodal Large Language Continual Assistant [59.585544987096974]
Multimodal Continual Instruction Tuning (MCIT) is adopted to continually instruct MLLMs to follow human intent in sequential datasets. Existing gradient update would heavily destroy the tuning performance on previous datasets. We propose a method called Multimodal Large Language Continual Assistant (LLaCA) to address the challenge.
arXiv Detail & Related papers (2024-10-08T11:24:59Z)
EMR-Merging: Tuning-Free High-Performance Model Merging [55.03509900949149]
We show that Elect, Mask & Rescale-Merging (EMR-Merging) shows outstanding performance compared to existing merging methods. EMR-Merging is tuning-free, thus requiring no data availability or any additional training while showing impressive performance.
arXiv Detail & Related papers (2024-05-23T05:25:45Z)
Towards Stable Machine Learning Model Retraining via Slowly Varying Sequences [6.067007470552307]
We propose a methodology for finding sequences of machine learning models that are stable across retraining iterations. We develop a mixed-integer optimization formulation that is guaranteed to recover optimal models. Our method shows stronger stability than greedily trained models with a small, controllable sacrifice in predictive power.
arXiv Detail & Related papers (2024-03-28T22:45:38Z)
Towards Plastic and Stable Exemplar-Free Incremental Learning: A Dual-Learner Framework with Cumulative Parameter Averaging [12.168402195820649]
We propose a Dual-Learner framework with Cumulative. Averaging (DLCPA) We show that DLCPA outperforms several state-of-the-art exemplar-free baselines in both Task-IL and Class-IL settings.
arXiv Detail & Related papers (2023-10-28T08:48:44Z)
Model Merging by Uncertainty-Based Gradient Matching [70.54580972266096]
We propose a new uncertainty-based scheme to improve the performance by reducing the mismatch. Our new method gives consistent improvements for large language models and vision transformers.
arXiv Detail & Related papers (2023-10-19T15:02:45Z)
AdaMerging: Adaptive Model Merging for Multi-Task Learning [68.75885518081357]
This paper introduces an innovative technique called Adaptive Model Merging (AdaMerging) It aims to autonomously learn the coefficients for model merging, either in a task-wise or layer-wise manner, without relying on the original training data. Compared to the current state-of-the-art task arithmetic merging scheme, AdaMerging showcases a remarkable 11% improvement in performance.
arXiv Detail & Related papers (2023-10-04T04:26:33Z)
New metrics for analyzing continual learners [27.868967961503962]
Continual Learning (CL) poses challenges to standard learning algorithms. This stability-plasticity dilemma remains central to CL and multiple metrics have been proposed to adequately measure stability and plasticity separately. We propose new metrics that account for the task's increasing difficulty.
arXiv Detail & Related papers (2023-09-01T13:53:33Z)
Continual Learners are Incremental Model Generalizers [70.34479702177988]
This paper extensively studies the impact of Continual Learning (CL) models as pre-trainers. We find that the transfer quality of the representation often increases gradually without noticeable degradation in fine-tuning performance. We propose a new fine-tuning scheme, GLobal Attention Discretization (GLAD), that preserves rich task-generic representation during solving downstream tasks.
arXiv Detail & Related papers (2023-06-21T05:26:28Z)
Model Stability with Continuous Data Updates [2.439909645714735]
We study the "stability" of machine learning (ML) models within the context of larger, complex NLP systems. We find that model design choices, including network architecture and input representation, have a critical impact on stability. We recommend ML model designers account for trade-offs in accuracy and jitter when making modeling choices.
arXiv Detail & Related papers (2022-01-14T22:11:16Z)
Goal-Aware Prediction: Learning to Model What Matters [105.43098326577434]
One of the fundamental challenges in using a learned forward dynamics model is the mismatch between the objective of the learned model and that of the downstream planner or policy. We propose to direct prediction towards task relevant information, enabling the model to be aware of the current task and encouraging it to only model relevant quantities of the state space. We find that our method more effectively models the relevant parts of the scene conditioned on the goal, and as a result outperforms standard task-agnostic dynamics models and model-free reinforcement learning.
arXiv Detail & Related papers (2020-07-14T16:42:59Z)

This list is automatically generated from the titles and abstracts of the papers in this site.