CLARE: Continual Learning for Vision-Language-Action Models via Autonomous Adapter Routing and Expansion
- URL: http://arxiv.org/abs/2601.09512v1
- Date: Wed, 14 Jan 2026 14:23:42 GMT
- Title: CLARE: Continual Learning for Vision-Language-Action Models via Autonomous Adapter Routing and Expansion
- Authors: Ralf Römer, Yi Zhang, Angela P. Schoellig,
- Abstract summary: CLARE is a framework for exemplar-free continual learning with vision-language-action models.<n>We show that CLARE achieves high performance on new tasks without catastrophic forgetting of earlier tasks.
- Score: 9.808005698482914
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: To teach robots complex manipulation tasks, it is now a common practice to fine-tune a pre-trained vision-language-action model (VLA) on task-specific data. However, since this recipe updates existing representations, it is unsuitable for long-term operation in the real world, where robots must continually adapt to new tasks and environments while retaining the knowledge they have already acquired. Existing continual learning methods for robotics commonly require storing previous data (exemplars), struggle with long task sequences, or rely on task identifiers for deployment. To address these limitations, we propose CLARE, a general, parameter-efficient framework for exemplar-free continual learning with VLAs. CLARE introduces lightweight modular adapters into selected feedforward layers and autonomously expands the model only where necessary when learning a new task, guided by layer-wise feature similarity. During deployment, an autoencoder-based routing mechanism dynamically activates the most relevant adapters without requiring task labels. Through extensive experiments on the LIBERO benchmark, we show that CLARE achieves high performance on new tasks without catastrophic forgetting of earlier tasks, significantly outperforming even exemplar-based methods. Code and data are available at https://tum-lsy.github.io/clare.
Related papers
- Learning with Preserving for Continual Multitask Learning [4.847042727427382]
We introduce Learning with Preserving (LwP), a novel framework that shifts the focus from preserving task outputs to maintaining the shared representation space.<n>LwP not only mitigates catastrophic forgetting but also consistently outperforms state-of-the-art baselines in CMTL tasks.
arXiv Detail & Related papers (2025-11-11T22:23:20Z) - RICL: Adding In-Context Adaptability to Pre-Trained Vision-Language-Action Models [20.826907313227323]
Multi-task vision--action'' (VLA) models have recently demonstrated increasing promise as generalist foundation models for robotics.<n>For such models to be truly useful, an end user must have easy means to teach them to improve.<n>For language and vision models, the emergent ability to perform in-context learning (ICL) has proven to be a versatile interface to easily teach new tasks.
arXiv Detail & Related papers (2025-08-04T05:01:11Z) - TaskVAE: Task-Specific Variational Autoencoders for Exemplar Generation in Continual Learning for Human Activity Recognition [1.0687457324219043]
Continual Learning enables models to learn from evolving data streams while minimizing forgetting of prior knowledge.<n>This paper presents TaskVAE, a framework for replay-based CL in class-incremental settings.<n>In contrast to traditional methods that require prior knowledge of the total class count or rely on a single VAE for all tasks, TaskVAE adapts flexibly to increasing tasks without such constraints.
arXiv Detail & Related papers (2025-05-10T17:42:01Z) - LLaVA-CMoE: Towards Continual Mixture of Experts for Large Vision-Language Models [21.888139819188105]
LLaVA-CMoE is a continual learning framework for large language models.<n> Probe-Guided Knowledge Extension mechanism determines when and where new experts should be added.<n>Probabilistic Task Locator assigns each task a dedicated, lightweight router.
arXiv Detail & Related papers (2025-03-27T07:36:11Z) - Generalizable Long-Horizon Manipulations with Large Language Models [91.740084601715]
This work introduces a framework harnessing the capabilities of Large Language Models (LLMs) to generate primitive task conditions for generalizable long-horizon manipulations.
We create a challenging robotic manipulation task suite based on Pybullet for long-horizon task evaluation.
arXiv Detail & Related papers (2023-10-03T17:59:46Z) - Towards Robust Continual Learning with Bayesian Adaptive Moment Regularization [51.34904967046097]
Continual learning seeks to overcome the challenge of catastrophic forgetting, where a model forgets previously learnt information.
We introduce a novel prior-based method that better constrains parameter growth, reducing catastrophic forgetting.
Results show that BAdam achieves state-of-the-art performance for prior-based methods on challenging single-headed class-incremental experiments.
arXiv Detail & Related papers (2023-09-15T17:10:51Z) - Task-Attentive Transformer Architecture for Continual Learning of
Vision-and-Language Tasks Using Knowledge Distillation [18.345183818638475]
Continual learning (CL) can serve as a remedy through enabling knowledge-transfer across sequentially arriving tasks.
We develop a transformer-based CL architecture for learning bimodal vision-and-language tasks.
Our approach is scalable learning to a large number of tasks because it requires little memory and time overhead.
arXiv Detail & Related papers (2023-03-25T10:16:53Z) - Task Residual for Tuning Vision-Language Models [69.22958802711017]
We propose a new efficient tuning approach for vision-language models (VLMs) named Task Residual Tuning (TaskRes)
TaskRes explicitly decouples the prior knowledge of the pre-trained models and new knowledge regarding a target task.
The proposed TaskRes is simple yet effective, which significantly outperforms previous methods on 11 benchmark datasets.
arXiv Detail & Related papers (2022-11-18T15:09:03Z) - Continual Object Detection via Prototypical Task Correlation Guided
Gating Mechanism [120.1998866178014]
We present a flexible framework for continual object detection via pRotOtypical taSk corrElaTion guided gaTingAnism (ROSETTA)
Concretely, a unified framework is shared by all tasks while task-aware gates are introduced to automatically select sub-models for specific tasks.
Experiments on COCO-VOC, KITTI-Kitchen, class-incremental detection on VOC and sequential learning of four tasks show that ROSETTA yields state-of-the-art performance.
arXiv Detail & Related papers (2022-05-06T07:31:28Z) - Few-Shot Class-Incremental Learning by Sampling Multi-Phase Tasks [59.12108527904171]
A model should recognize new classes and maintain discriminability over old classes.
The task of recognizing few-shot new classes without forgetting old classes is called few-shot class-incremental learning (FSCIL)
We propose a new paradigm for FSCIL based on meta-learning by LearnIng Multi-phase Incremental Tasks (LIMIT)
arXiv Detail & Related papers (2022-03-31T13:46:41Z) - Hierarchical Few-Shot Imitation with Skill Transition Models [66.81252581083199]
Few-shot Imitation with Skill Transition Models (FIST) is an algorithm that extracts skills from offline data and utilizes them to generalize to unseen tasks.
We show that FIST is capable of generalizing to new tasks and substantially outperforms prior baselines in navigation experiments.
arXiv Detail & Related papers (2021-07-19T15:56:01Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.