Related papers: Modular Embedding Recomposition for Incremental Learning

Modular Embedding Recomposition for Incremental Learning

URL: http://arxiv.org/abs/2508.16463v2
Date: Tue, 14 Oct 2025 16:54:27 GMT
Title: Modular Embedding Recomposition for Incremental Learning
Authors: Aniello Panariello, Emanuele Frascaroli, Pietro Buzzega, Lorenzo Bonicelli, Angelo Porrello, Simone Calderara,
Abstract summary: We propose an approach that transforms preservation into enhancement of the zero-shot capabilities of Vision-Language Models (VLMs)<n>Our approach, named MoDular Embedding Recomposition (MoDER), introduces a modular framework that trains multiple textual experts, each specialized in a single seen class, and stores them in a foundational hub.<n>At inference time, for each unseen class, we query the hub and compose the retrieved experts to synthesize a refined prototype that improves classification.
Score: 23.789486655098585
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: The advent of pre-trained Vision-Language Models (VLMs) has significantly transformed Continual Learning (CL), mainly due to their zero-shot classification abilities. Such proficiency makes VLMs well-suited for real-world applications, enabling robust performance on novel unseen classes without requiring adaptation. However, fine-tuning remains essential when downstream tasks deviate significantly from the pre-training domain. Prior CL approaches primarily focus on preserving the zero-shot capabilities of VLMs during incremental fine-tuning on a downstream task. We take a step further by devising an approach that transforms preservation into enhancement of the zero-shot capabilities of VLMs. Our approach, named MoDular Embedding Recomposition (MoDER), introduces a modular framework that trains multiple textual experts, each specialized in a single seen class, and stores them in a foundational hub. At inference time, for each unseen class, we query the hub and compose the retrieved experts to synthesize a refined prototype that improves classification. We show the effectiveness of our method across two popular zero-shot incremental protocols, Class-IL and MTIL, comprising a total of 14 datasets. The codebase is available at https://github.com/aimagelab/mammoth.

Related papers

Unlocking Prototype Potential: An Efficient Tuning Framework for Few-Shot Class-Incremental Learning [69.28860905525057]
Few-shot class-incremental learning (FSCIL) seeks to continuously learn new classes from very limited samples.<n>We introduce an efficient prototype fine-tuning framework that evolves static centroids into dynamic, learnable components.
arXiv Detail & Related papers (2026-02-05T03:50:53Z)
Continual Learning for VLMs: A Survey and Taxonomy Beyond Forgetting [70.83781268763215]
Vision-language models (VLMs) have achieved impressive performance across diverse multimodal tasks by leveraging large-scale pre-training.<n>VLMs face unique challenges such as cross-modal feature drift, parameter interference due to shared architectures, and zero-shot capability erosion.<n>This survey aims to serve as a comprehensive and diagnostic reference for researchers developing lifelong vision-language systems.
arXiv Detail & Related papers (2025-08-06T09:03:10Z)
Beyond CLIP Generalization: Against Forward&Backward Forgetting Adapter for Continual Learning of Vision-Language Models [19.71113926850385]
The AFA method significantly outperforms existing state-of-the-art approaches.<n>It surpasses the inherent zero-shot performance of CLIP in terms of transferability.
arXiv Detail & Related papers (2025-05-12T15:56:23Z)
CLIP with Generative Latent Replay: a Strong Baseline for Incremental Learning [17.614980614656407]
We propose Continual Generative training for Incremental prompt-Learning. We exploit Variational Autoencoders to learn class-conditioned distributions. We show that such a generative replay approach can adapt to new tasks while improving zero-shot capabilities.
arXiv Detail & Related papers (2024-07-22T16:51:28Z)
Enhancing Visual Continual Learning with Language-Guided Supervision [76.38481740848434]
Continual learning aims to empower models to learn new tasks without forgetting previously acquired knowledge. We argue that the scarce semantic information conveyed by the one-hot labels hampers the effective knowledge transfer across tasks. Specifically, we use PLMs to generate semantic targets for each class, which are frozen and serve as supervision signals.
arXiv Detail & Related papers (2024-03-24T12:41:58Z)
Read Between the Layers: Leveraging Multi-Layer Representations for Rehearsal-Free Continual Learning with Pre-Trained Models [15.847302755988506]
We address the Continual Learning problem, wherein a model must learn a sequence of tasks from non-stationary distributions. We propose LayUP, a new prototype-based approach to CL that leverages second-order feature statistics from multiple intermediate layers of a pre-trained network. Our results demonstrate that fully exhausting the representational capacities of pre-trained models in CL goes well beyond their final embeddings.
arXiv Detail & Related papers (2023-12-13T13:11:44Z)
Beyond Sole Strength: Customized Ensembles for Generalized Vision-Language Models [55.5610165938949]
Fine-tuning vision-language models (VLMs) has gained increasing popularity due to its practical value. This paper explores the collaborative potential of leveraging much weaker VLMs to enhance the generalization of a robust single model. We introduce three customized ensemble strategies, each tailored to one specific scenario. The proposed ensemble strategies are evaluated on zero-shot, base-to-new, and cross-dataset generalization, achieving new state-of-the-art performance.
arXiv Detail & Related papers (2023-11-28T05:17:25Z)
RanPAC: Random Projections and Pre-trained Models for Continual Learning [59.07316955610658]
Continual learning (CL) aims to learn different tasks (such as classification) in a non-stationary data stream without forgetting old ones. We propose a concise and effective approach for CL with pre-trained models.
arXiv Detail & Related papers (2023-07-05T12:49:02Z)
Preventing Zero-Shot Transfer Degradation in Continual Learning of Vision-Language Models [13.340759455910721]
We propose a novel method to prevent zero-shot transfer degradation in the continual learning of vision-language models. Our method outperforms other methods in the traditional class-incremental learning setting.
arXiv Detail & Related papers (2023-03-12T10:28:07Z)
Task Residual for Tuning Vision-Language Models [69.22958802711017]
We propose a new efficient tuning approach for vision-language models (VLMs) named Task Residual Tuning (TaskRes) TaskRes explicitly decouples the prior knowledge of the pre-trained models and new knowledge regarding a target task. The proposed TaskRes is simple yet effective, which significantly outperforms previous methods on 11 benchmark datasets.
arXiv Detail & Related papers (2022-11-18T15:09:03Z)

This list is automatically generated from the titles and abstracts of the papers in this site.