Merging without Forgetting: Continual Fusion of Task-Specific Models via Optimal Transport
- URL: http://arxiv.org/abs/2511.19561v1
- Date: Mon, 24 Nov 2025 15:27:47 GMT
- Title: Merging without Forgetting: Continual Fusion of Task-Specific Models via Optimal Transport
- Authors: Zecheng Pan, Zhikang Chen, Ding Li, Min Zhang, Sen Cui, Hongshuo Jin, Luqi Tao, Yi Yang, Deheng Ye, Yu Zhang, Tingting Zhu, Tianling Ren,
- Abstract summary: OTMF is a novel model merging framework rooted in optimal transport theory.<n>It supports a continual fusion paradigm that incrementally integrates each new task vector without revisiting previous ones.<n>We conduct comprehensive experiments on multiple vision and language benchmarks, and results show that OTMF achieves state-of-the-art performance in terms of both accuracy and efficiency.
- Score: 29.006391770977796
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Merging models fine-tuned for different tasks into a single unified model has become an increasingly important direction for building versatile, efficient multi-task systems. Existing approaches predominantly rely on parameter interpolation in weight space, which we show introduces significant distribution shift in the feature space and undermines task-specific knowledge. In this paper, we propose OTMF (Optimal Transport-based Masked Fusion), a novel model merging framework rooted in optimal transport theory to address the distribution shift that arises from naive parameter interpolation. Instead of directly aggregating features or weights, OTMF aligns the semantic geometry of task-specific models by discovering common masks applied to task vectors through optimal transport plans. These masks selectively extract transferable and task-agnostic components while preserving the unique structural identities of each task. To ensure scalability in real-world settings, OTMF further supports a continual fusion paradigm that incrementally integrates each new task vector without revisiting previous ones, maintaining a bounded memory footprint and enabling efficient fusion across a growing number of tasks. We conduct comprehensive experiments on multiple vision and language benchmarks, and results show that OTMF achieves state-of-the-art performance in terms of both accuracy and efficiency. These findings highlight the practical and theoretical value of our approach to model merging.
Related papers
- Model Merging in the Essential Subspace [78.5390284258307]
Model merging aims to integrate multiple task-specific fine-tuned models into a single multi-task model without additional training.<n>Despite extensive research, task interference remains a major obstacle that often undermines the performance of merged models.<n>We propose ESM (Essential Subspace Merging), a robust framework for effective model merging.
arXiv Detail & Related papers (2026-02-23T00:33:38Z) - Toward Enhancing Representation Learning in Federated Multi-Task Settings [13.483094713610322]
Federated multi-task learning (FMTL) seeks to collaboratively train customized models for users with different tasks.<n>We propose Muscle loss, a contrastive learning objective that simultaneously aligns representations from all participating models.<n>We develop FedMuscle, a practical and communication-efficient FMTL algorithm that naturally handles both model and task.
arXiv Detail & Related papers (2026-02-02T04:39:36Z) - AR-MOT: Autoregressive Multi-object Tracking [56.09738000988466]
We propose a novel autoregressive paradigm that formulates MOT as a sequence generation task within a large language model (LLM) framework.<n>This design enables the model to output structured results through flexible sequence construction, without requiring any task-specific heads.<n>To enhance region-level visual perception, we introduce an Object Tokenizer based on a pretrained detector.
arXiv Detail & Related papers (2026-01-05T09:17:28Z) - Parameter Aware Mamba Model for Multi-task Dense Prediction [69.94454603308196]
We introduce a novel decoder-based framework, Aware Mamba Model (PAMM), specifically designed for dense prediction in multi-task learning setting.<n>It features dual state space parameter experts that integrate and set task-specific parameter priors, capturing the intrinsic properties of each task.<n>We employ the Multi-Directional Hilbert Scanning method to construct multi-angle feature sequences, thereby enhancing the sequence model's perceptual capabilities for 2D data.
arXiv Detail & Related papers (2025-11-18T13:48:00Z) - Merging Smarter, Generalizing Better: Enhancing Model Merging on OOD Data [16.462869377794316]
Multi-task learning (MTL) concurrently trains a model on diverse task datasets to exploit common features.<n>Recent studies have dedicated efforts to merging multiple independent model parameters into a unified model for MTL.<n>We propose LwPTV (Layer-wise Pruning Task Vector) by building a saliency score, measuring the redundancy of parameters in task vectors.
arXiv Detail & Related papers (2025-06-10T11:34:23Z) - Towards Unified Modeling in Federated Multi-Task Learning via Subspace Decoupling [23.642760378344335]
Federated Multi-Task Learning (FMTL) enables multiple clients performing heterogeneous tasks without exchanging their local data.<n>Most existing FMTL methods focus on building personalized models for each client and unable to support the aggregation of multiple heterogeneous tasks into a unified model.<n>We propose FedDEA, an update-structure-aware aggregation method specifically designed for multi-task model integration.
arXiv Detail & Related papers (2025-05-30T03:53:21Z) - Large Language Model as Meta-Surrogate for Data-Driven Many-Task Optimization: A Proof-of-Principle Study [11.452011929848844]
This study proposes a novel meta-surrogate framework to assist many-task optimization.<n>We formulate a unified framework for many-task fitness prediction, by defining a universal model with metadata to fit a group of problems.<n>Our framework supports dual-level knowledge transfer -- at both the surrogate and individual levels -- enhancing optimization efficiency and robustness.
arXiv Detail & Related papers (2025-03-11T11:13:11Z) - Every SAM Drop Counts: Embracing Semantic Priors for Multi-Modality Image Fusion and Beyond [52.486290612938895]
We propose a novel method that leverages the semantic knowledge from the Segment Anything Model (SAM) to Grow the quality of fusion results and Enable downstream task adaptability.<n> Specifically, we design a Semantic Persistent Attention (SPA) Module that efficiently maintains source information via the persistent repository while extracting high-level semantic priors from SAM.<n>Our method achieves a balance between high-quality visual results and downstream task adaptability while maintaining practical deployment efficiency.
arXiv Detail & Related papers (2025-03-03T06:16:31Z) - RobustMerge: Parameter-Efficient Model Merging for MLLMs with Direction Robustness [28.437105789298244]
RobustMerge is a training-free parameter-efficient merging method with complementary parameter adaptation to maintain direction robustness.<n>We establish a benchmark consisting of diverse multimodal tasks, on which we conduct experiments to certify the outstanding performance and generalizability of our method.
arXiv Detail & Related papers (2025-02-24T13:52:05Z) - Merging Models on the Fly Without Retraining: A Sequential Approach to Scalable Continual Model Merging [75.93960998357812]
Deep model merging represents an emerging research direction that combines multiple fine-tuned models to harness their capabilities across different tasks and domains.<n>Current model merging techniques focus on merging all available models simultaneously, with weight matrices-based methods being the predominant approaches.<n>We propose a training-free projection-based continual merging method that processes models sequentially.
arXiv Detail & Related papers (2025-01-16T13:17:24Z) - Task-Distributionally Robust Data-Free Meta-Learning [99.56612787882334]
Data-Free Meta-Learning (DFML) aims to efficiently learn new tasks by leveraging multiple pre-trained models without requiring their original training data.
For the first time, we reveal two major challenges hindering their practical deployments: Task-Distribution Shift ( TDS) and Task-Distribution Corruption (TDC)
arXiv Detail & Related papers (2023-11-23T15:46:54Z) - AdaMerging: Adaptive Model Merging for Multi-Task Learning [68.75885518081357]
This paper introduces an innovative technique called Adaptive Model Merging (AdaMerging)
It aims to autonomously learn the coefficients for model merging, either in a task-wise or layer-wise manner, without relying on the original training data.
Compared to the current state-of-the-art task arithmetic merging scheme, AdaMerging showcases a remarkable 11% improvement in performance.
arXiv Detail & Related papers (2023-10-04T04:26:33Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.