Related papers: ACE-Merging: Data-Free Model Merging with Adaptive Covariance Estimation

ACE-Merging: Data-Free Model Merging with Adaptive Covariance Estimation

URL: http://arxiv.org/abs/2603.02945v1
Date: Tue, 03 Mar 2026 12:53:04 GMT
Title: ACE-Merging: Data-Free Model Merging with Adaptive Covariance Estimation
Authors: Bo Xu, Haotian Wu, Hehai Lin, Weiquan Huang, Beier Zhu, Yao Shu, Chengwei Qin,
Abstract summary: Model merging aims to combine multiple task-specific expert models into a single model.<n>Interference among experts, especially when they are trained on different objectives, often leads to significant performance degradation.<n>acem is an Adaptive Covariance Estimation framework that effectively mitigates inter-task interference.
Score: 34.173549610331385
License: http://creativecommons.org/licenses/by/4.0/
Abstract: Model merging aims to combine multiple task-specific expert models into a single model while preserving generalization across diverse tasks. However, interference among experts, especially when they are trained on different objectives, often leads to significant performance degradation. Despite recent progress, resolving this interference without data access, retraining, or architectural modification remains a fundamental challenge. This paper provides a theoretical analysis demonstrating that the input covariance of each task, which is a key factor for optimal merging, can be implicitly estimated from the parameter differences of its fine-tuned model, even in a fully data-free setting. Building on this insight, we introduce \acem, an Adaptive Covariance Estimation framework that effectively mitigates inter-task interference. Our approach features a principled, closed-form solution that contrasts with prior iterative or heuristic methods. Extensive experiments on both vision and language benchmarks demonstrate that \acem sets a new state-of-the-art among data-free methods. It consistently outperforms existing baselines; for example, \acem achieves an average absolute improvement of 4\% over the previous methods across seven tasks on GPT-2. Owing to its efficient closed-form formulation, \acem delivers superior performance with a modest computational cost, providing a practical and theoretically grounded solution for model merging.

Related papers

Understanding Model Merging: A Unified Generalization Framework for Heterogeneous Experts [36.26786113564521]
Model merging efficiently aggregates capabilities from multiple fine-tuned models into a single one.<n>Despite empirical successes, a unified theory for its effectiveness under heterogeneous finetuning hyper parameters remains missing.<n>We use $L$-Stability theory to analyze the generalization of the merged model $boldsymbolx_avg$.
arXiv Detail & Related papers (2026-01-29T13:22:06Z)
Towards Minimizing Feature Drift in Model Merging: Layer-wise Task Vector Fusion for Adaptive Knowledge Integration [14.503741632243646]
Multi-task model merging aims to consolidate knowledge from multiple task-specific experts into a unified model.<n>Existing methods approach this by minimizing differences between task-specific experts and the unified model.<n>We propose Layer-wise Optimal Task Vector Merging, a technique that explicitly minimizes feature drift between task-specific experts and the unified model.
arXiv Detail & Related papers (2025-05-29T08:11:31Z)
RobustMerge: Parameter-Efficient Model Merging for MLLMs with Direction Robustness [28.437105789298244]
RobustMerge is a training-free parameter-efficient merging method with complementary parameter adaptation to maintain direction robustness.<n>We establish a benchmark consisting of diverse multimodal tasks, on which we conduct experiments to certify the outstanding performance and generalizability of our method.
arXiv Detail & Related papers (2025-02-24T13:52:05Z)
Modeling Multi-Task Model Merging as Adaptive Projective Gradient Descent [72.10987117380584]
Merging multiple expert models offers a promising approach for performing multi-task learning without accessing their original data.<n>We find existing methods discard task-specific information that, while causing conflicts, is crucial for performance.<n>Our approach consistently outperforms previous methods, achieving state-of-the-art results across diverse architectures and tasks in both vision and NLP domains.
arXiv Detail & Related papers (2025-01-02T12:45:21Z)
FedDUAL: A Dual-Strategy with Adaptive Loss and Dynamic Aggregation for Mitigating Data Heterogeneity in Federated Learning [12.307490659840845]
Federated Learning (FL) combines locally optimized models from various clients into a unified global model.<n>FL encounters significant challenges such as performance degradation, slower convergence, and reduced robustness of the global model.<n>We introduce an innovative dual-strategy approach designed to effectively resolve these issues.
arXiv Detail & Related papers (2024-12-05T18:42:29Z)
MITA: Bridging the Gap between Model and Data for Test-time Adaptation [68.62509948690698]
Test-Time Adaptation (TTA) has emerged as a promising paradigm for enhancing the generalizability of models. We propose Meet-In-The-Middle based MITA, which introduces energy-based optimization to encourage mutual adaptation of the model and data from opposing directions.
arXiv Detail & Related papers (2024-10-12T07:02:33Z)
Parameter Competition Balancing for Model Merging [13.66727853299506]
PCB-Merging is a training-free technique that adjusts the coefficients of each parameter for effective model merging. PCB-Merging achieves substantial performance enhancements across multiple modalities, domains, model sizes, number of tasks, fine-tuning forms, and large language models.
arXiv Detail & Related papers (2024-10-03T11:17:58Z)
MAP: Low-compute Model Merging with Amortized Pareto Fronts via Quadratic Approximation [80.47072100963017]
We introduce a novel and low-compute algorithm, Model Merging with Amortized Pareto Front (MAP)<n>MAP efficiently identifies a set of scaling coefficients for merging multiple models, reflecting the trade-offs involved.<n>We also introduce Bayesian MAP for scenarios with a relatively low number of tasks and Nested MAP for situations with a high number of tasks, further reducing the computational cost of evaluation.
arXiv Detail & Related papers (2024-06-11T17:55:25Z)
It's All in the Mix: Wasserstein Classification and Regression with Mixed Features [2.2685251390114565]
We develop and analyze distributionally robust prediction models that faithfully account for the presence of discrete features.<n>We demonstrate that our models can significantly outperform existing methods that are agnostic to the presence of discrete features.
arXiv Detail & Related papers (2023-12-19T15:15:52Z)
DRFLM: Distributionally Robust Federated Learning with Inter-client Noise via Local Mixup [58.894901088797376]
federated learning has emerged as a promising approach for training a global model using data from multiple organizations without leaking their raw data. We propose a general framework to solve the above two challenges simultaneously. We provide comprehensive theoretical analysis including robustness analysis, convergence analysis, and generalization ability.
arXiv Detail & Related papers (2022-04-16T08:08:29Z)
Accounting for Unobserved Confounding in Domain Generalization [107.0464488046289]
This paper investigates the problem of learning robust, generalizable prediction models from a combination of datasets. Part of the challenge of learning robust models lies in the influence of unobserved confounders. We demonstrate the empirical performance of our approach on healthcare data from different modalities.
arXiv Detail & Related papers (2020-07-21T08:18:06Z)
Meta-Learned Confidence for Few-shot Learning [60.6086305523402]
A popular transductive inference technique for few-shot metric-based approaches, is to update the prototype of each class with the mean of the most confident query examples. We propose to meta-learn the confidence for each query sample, to assign optimal weights to unlabeled queries. We validate our few-shot learning model with meta-learned confidence on four benchmark datasets.
arXiv Detail & Related papers (2020-02-27T10:22:17Z)

This list is automatically generated from the titles and abstracts of the papers in this site.