Related papers: AIMMerging: Adaptive Iterative Model Merging Using Training Trajectories for Language Model Continual Learning

AIMMerging: Adaptive Iterative Model Merging Using Training Trajectories for Language Model Continual Learning

URL: http://arxiv.org/abs/2509.17348v1
Date: Mon, 22 Sep 2025 04:19:29 GMT
Title: AIMMerging: Adaptive Iterative Model Merging Using Training Trajectories for Language Model Continual Learning
Authors: Yujie Feng, Jian Li, Xiaoyu Dong, Pengfei Xu, Xiaohui Zhou, Yujia Zhang, Zexin LU, Yasha Wang, Alan Zhao, Xu Chu, Xiao-Ming Wu,
Abstract summary: We introduce Adaptive Iterative Model Merging (AimMerging), a novel CL framework that monitors the model's training status.<n>Experiments demonstrate that AimMerging achieves significant performance improvements over existing state-of-the-art methods.
Score: 35.182662964528845
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Continual learning (CL) is essential for deploying large language models (LLMs) in dynamic real-world environments without the need for costly retraining. Recent model merging-based methods have attracted significant attention, but they still struggle to effectively manage the trade-off between learning new knowledge and preventing forgetting, a challenge largely stemming from suboptimal number of merges and merging frequency. In this paper, we introduce Adaptive Iterative Model Merging (AimMerging), a novel CL framework that utilizes learning and forgetting signals from the training trajectory to dynamically monitor the model's training status. Guided by dynamic monitoring, the training trajectory-guided merge controller adaptively determines the timing and frequency of iterative fusion, while the rehearsal-based knowledge fusion module computes the merging weights and executes the fusion. Comprehensive experiments on three CL benchmarks with various model sizes (from 770M to 13B) demonstrate that AimMerging achieves significant performance improvements over existing state-of-the-art methods, with an average relative improvement of 80% and 59% on FWT and BWT, respectively. The source code is provided for reproducibility.

Related papers

Bagging-Based Model Merging for Robust General Text Embeddings [73.51674133699196]
General-purpose text embedding models underpin a wide range of NLP and information retrieval applications.<n>We present a systematic study of multi-task training for text embeddings from two perspectives: data scheduling and model merging.<n>We propose Bagging-based rObust mOdel Merging (BOOM), which trains multiple embedding models on sampled subsets and merges them into a single model.
arXiv Detail & Related papers (2026-02-05T15:45:08Z)
Intrinsic Training Signals for Federated Learning Aggregation [13.540945877050525]
Federated Learning (FL) enables collaborative model training across distributed clients while preserving data privacy.<n>This work demonstrates that effective model merging can be achieved solely through existing training signals.
arXiv Detail & Related papers (2025-07-09T13:03:23Z)
Efficient Federated Learning with Timely Update Dissemination [54.668309196009204]
Federated Learning (FL) has emerged as a compelling methodology for the management of distributed data.<n>We propose an efficient FL approach that capitalizes on additional downlink bandwidth resources to ensure timely update dissemination.
arXiv Detail & Related papers (2025-07-08T14:34:32Z)
Ring-lite: Scalable Reasoning via C3PO-Stabilized Reinforcement Learning for LLMs [51.21041884010009]
Ring-lite is a Mixture-of-Experts (MoE)-based large language model optimized via reinforcement learning (RL)<n>Our approach matches the performance of state-of-the-art (SOTA) small-scale reasoning models on challenging benchmarks.
arXiv Detail & Related papers (2025-06-17T17:12:34Z)
Recurrent Knowledge Identification and Fusion for Language Model Continual Learning [41.901501650712234]
Recurrent-KIF is a CL framework for Recurrent Knowledge Identification and Fusion.<n>Inspired by human continual learning, Recurrent-KIF employs an inner loop that rapidly adapts to new tasks.<n> outer loop that globally manages the fusion of new and historical knowledge.
arXiv Detail & Related papers (2025-02-22T05:37:27Z)
Local Superior Soups: A Catalyst for Model Merging in Cross-Silo Federated Learning [33.88701368538447]
We propose an innovative model-based local training technique called Local Superior Soups'' Our method enhances local training across different clients, encouraging the exploration of a connected low-loss basin. We demonstrated its effectiveness and efficiency across diverse widely-used FL datasets.
arXiv Detail & Related papers (2024-10-31T06:20:17Z)
Boosting Continual Learning of Vision-Language Models via Mixture-of-Experts Adapters [65.15700861265432]
We present a parameter-efficient continual learning framework to alleviate long-term forgetting in incremental learning with vision-language models. Our approach involves the dynamic expansion of a pre-trained CLIP model, through the integration of Mixture-of-Experts (MoE) adapters. To preserve the zero-shot recognition capability of vision-language models, we introduce a Distribution Discriminative Auto-Selector.
arXiv Detail & Related papers (2024-03-18T08:00:23Z)
AdaMerging: Adaptive Model Merging for Multi-Task Learning [68.75885518081357]
This paper introduces an innovative technique called Adaptive Model Merging (AdaMerging) It aims to autonomously learn the coefficients for model merging, either in a task-wise or layer-wise manner, without relying on the original training data. Compared to the current state-of-the-art task arithmetic merging scheme, AdaMerging showcases a remarkable 11% improvement in performance.
arXiv Detail & Related papers (2023-10-04T04:26:33Z)
Improving Rare Word Recognition with LM-aware MWER Training [50.241159623691885]
We introduce LMs in the learning of hybrid autoregressive transducer (HAT) models in the discriminative training framework. For the shallow fusion setup, we use LMs during both hypotheses generation and loss computation, and the LM-aware MWER-trained model achieves 10% relative improvement. For the rescoring setup, we learn a small neural module to generate per-token fusion weights in a data-dependent manner.
arXiv Detail & Related papers (2022-04-15T17:19:41Z)

This list is automatically generated from the titles and abstracts of the papers in this site.

This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.