HAM: Hierarchical Adapter Merging for Scalable Continual Learning
- URL: http://arxiv.org/abs/2509.13211v3
- Date: Thu, 18 Sep 2025 13:56:54 GMT
- Title: HAM: Hierarchical Adapter Merging for Scalable Continual Learning
- Authors: Eric Nuertey Coleman, Luigi Quarantiello, Samrat Mukherjee, Julio Hurtado, Vincenzo Lomonaco,
- Abstract summary: New knowledge can interfere with previously learned information, causing the model to forget earlier knowledge in favor of the new.<n>This paper introduces Hierarchical Adapters Merging (HAM), a novel framework that dynamically combines adapters from different tasks during training.<n>Ham significantly outperforms state-of-the-art methods, particularly as the number of tasks increases.
- Score: 5.958899330375292
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Continual learning is an essential capability of human cognition, yet it poses significant challenges for current deep learning models. The primary issue is that new knowledge can interfere with previously learned information, causing the model to forget earlier knowledge in favor of the new, a phenomenon known as catastrophic forgetting. Although large pre-trained models can partially mitigate forgetting by leveraging their existing knowledge and over-parameterization, they often struggle when confronted with novel data distributions. Parameter-Efficient Fine-Tuning (PEFT) methods, such as LoRA, enable efficient adaptation to new knowledge. However, they still face challenges in scaling to dynamic learning scenarios and long sequences of tasks, as maintaining one adapter per task introduces complexity and increases the potential for interference. In this paper, we introduce Hierarchical Adapters Merging (HAM), a novel framework that dynamically combines adapters from different tasks during training. This approach enables HAM to scale effectively, allowing it to manage more tasks than competing baselines with improved efficiency. To achieve this, HAM maintains a fixed set of groups that hierarchically consolidate new adapters. For each task, HAM trains a low-rank adapter along with an importance scalar, then dynamically groups tasks based on adapter similarity. Within each group, adapters are pruned, scaled and merge, facilitating transfer learning between related tasks. Extensive experiments on three vision benchmarks show that HAM significantly outperforms state-of-the-art methods, particularly as the number of tasks increases.
Related papers
- Shared LoRA Subspaces for almost Strict Continual Learning [32.4267950435704]
Adapting large pretrained models to new tasks efficiently and continually is crucial for real-world deployment.<n>We propose Share, a novel approach to parameter efficient continual finetuning that learns and dynamically updates a single, shared low-rank subspace.<n>A single Share model can replace hundreds of task-specific LoRA adapters, supporting scalable, asynchronous continual learning.
arXiv Detail & Related papers (2026-02-05T18:59:58Z) - Mixtures of SubExperts for Large Language Continual Learning [6.425296129700846]
Adapting Large Language Models to a continuous stream of tasks is a critical yet challenging endeavor.<n>Reusing a single set of PEFT parameters for new tasks often leads to catastrophic forgetting of prior knowledge.<n>We propose a novel adaptive PEFT method referred to as textitMixtures of SubExperts (MoSEs), a novel continual learning framework designed for minimal forgetting and efficient scalability.
arXiv Detail & Related papers (2025-11-09T05:44:45Z) - CL-LoRA: Continual Low-Rank Adaptation for Rehearsal-Free Class-Incremental Learning [8.81873424028249]
Class-Incremental Learning (CIL) aims to learn new classes sequentially while retaining the knowledge of previously learned classes.<n>We propose a novel dual-adapter architecture combining textbftask-shared adapters to learn cross-task knowledge and textbftask-specific adapters to capture unique features of each new task.<n>We demonstrate CL-LoRA consistently achieves promising performance under multiple benchmarks with reduced training and inference computation.
arXiv Detail & Related papers (2025-05-30T17:19:52Z) - LLaVA-CMoE: Towards Continual Mixture of Experts for Large Vision-Language Models [21.888139819188105]
LLaVA-CMoE is a continual learning framework for large language models.<n> Probe-Guided Knowledge Extension mechanism determines when and where new experts should be added.<n>Probabilistic Task Locator assigns each task a dedicated, lightweight router.
arXiv Detail & Related papers (2025-03-27T07:36:11Z) - Linked Adapters: Linking Past and Future to Present for Effective Continual Learning [3.7166121807265045]
Continual learning allows the system to learn and adapt to new tasks while retaining the knowledge acquired from previous tasks.<n>Deep learning models suffer from catastrophic forgetting of knowledge learned from earlier tasks while learning a new task.<n>We propose a novel approach that allows knowledge transfer through a weighted attention mechanism to other task-specific adapters.
arXiv Detail & Related papers (2024-12-14T05:25:17Z) - Lifelong Sequence Generation with Dynamic Module Expansion and
Adaptation [39.886149621730915]
Lifelong sequence generation (LSG) aims to continually train a model on a sequence of generation tasks to learn constantly emerging new generation patterns.
Inspired by the learning paradigm of humans, we propose Dynamic Module Expansion and Adaptation (DMEA)
DMEA enables the model to dynamically determine the architecture for acquiring new knowledge based on task correlation and select the most similar previous tasks to facilitate adaptation to new tasks.
arXiv Detail & Related papers (2023-10-15T16:51:11Z) - Towards Robust Continual Learning with Bayesian Adaptive Moment Regularization [51.34904967046097]
Continual learning seeks to overcome the challenge of catastrophic forgetting, where a model forgets previously learnt information.
We introduce a novel prior-based method that better constrains parameter growth, reducing catastrophic forgetting.
Results show that BAdam achieves state-of-the-art performance for prior-based methods on challenging single-headed class-incremental experiments.
arXiv Detail & Related papers (2023-09-15T17:10:51Z) - Efficient Adaptive Human-Object Interaction Detection with
Concept-guided Memory [64.11870454160614]
We propose an efficient Adaptive HOI Detector with Concept-guided Memory (ADA-CM)
ADA-CM has two operating modes. The first mode makes it tunable without learning new parameters in a training-free paradigm.
Our proposed method achieves competitive results with state-of-the-art on the HICO-DET and V-COCO datasets with much less training time.
arXiv Detail & Related papers (2023-09-07T13:10:06Z) - ForkMerge: Mitigating Negative Transfer in Auxiliary-Task Learning [59.08197876733052]
Auxiliary-Task Learning (ATL) aims to improve the performance of the target task by leveraging the knowledge obtained from related tasks.
Sometimes, learning multiple tasks simultaneously results in lower accuracy than learning only the target task, known as negative transfer.
ForkMerge is a novel approach that periodically forks the model into multiple branches, automatically searches the varying task weights.
arXiv Detail & Related papers (2023-01-30T02:27:02Z) - Effective Adaptation in Multi-Task Co-Training for Unified Autonomous
Driving [103.745551954983]
In this paper, we investigate the transfer performance of various types of self-supervised methods, including MoCo and SimCLR, on three downstream tasks.
We find that their performances are sub-optimal or even lag far behind the single-task baseline.
We propose a simple yet effective pretrain-adapt-finetune paradigm for general multi-task training.
arXiv Detail & Related papers (2022-09-19T12:15:31Z) - Uniform Priors for Data-Efficient Transfer [65.086680950871]
We show that features that are most transferable have high uniformity in the embedding space.
We evaluate the regularization on its ability to facilitate adaptation to unseen tasks and data.
arXiv Detail & Related papers (2020-06-30T04:39:36Z) - Meta-Reinforcement Learning Robust to Distributional Shift via Model
Identification and Experience Relabeling [126.69933134648541]
We present a meta-reinforcement learning algorithm that is both efficient and extrapolates well when faced with out-of-distribution tasks at test time.
Our method is based on a simple insight: we recognize that dynamics models can be adapted efficiently and consistently with off-policy data.
arXiv Detail & Related papers (2020-06-12T13:34:46Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.