Related papers: From Coefficients to Directions: Rethinking Model Merging with Directional Alignment

From Coefficients to Directions: Rethinking Model Merging with Directional Alignment

URL: http://arxiv.org/abs/2512.00391v1
Date: Sat, 29 Nov 2025 08:40:58 GMT
Title: From Coefficients to Directions: Rethinking Model Merging with Directional Alignment
Authors: Zhikang Chen, Sen Cui, Deheng Ye, Min Zhang, Gang Niu, Yu Zhang, Masashi Sugiyama, Tingting Zhu,
Abstract summary: We introduce a unified geometric framework, emphMerging with Directional Alignment (method), which aligns directional structures consistently in both the parameter and feature spaces.<n>Our analysis shows that directional alignment improves structural coherence, and extensive experiments across benchmarks, model scales, and task configurations further validate the effectiveness of our approach.
Score: 66.99062575537555
License: http://creativecommons.org/licenses/by/4.0/
Abstract: Model merging has emerged as a practical paradigm for integrating multiple independently trained models into a single model without joint retraining. Previous studies have demonstrated the effectiveness of combining parameters through strategies such as parameter decomposition, coefficient optimization, and subspace learning, significantly reducing the need for expensive joint training and achieving strong empirical performance across diverse tasks. However, these approaches predominantly treat merging as a problem of parameter space decomposition or fusion coefficient optimization, while overlooking the critical role of directional information in both parameter and feature spaces. In practice, naïve merging introduces inconsistencies in dominant parameter directions and disrupts structural coherence across models, which can degrade performance. Moreover, coefficient-based optimization methods implicitly assume compatible feature-space directions across models. However, Neural Collapse indicates that class features follow structured directional patterns, which may differ across independently trained models, making coefficient optimization alone insufficient. In this work, we emphasize the importance of \emph{directional alignment} and introduce a unified geometric framework, \emph{Merging with Directional Alignment} (\method{}), which aligns directional structures consistently in both the parameter and feature spaces. Our analysis shows that directional alignment improves structural coherence, and extensive experiments across benchmarks, model scales, and task configurations further validate the effectiveness of our approach.

Related papers

Beyond Parameter Arithmetic: Sparse Complementary Fusion for Distribution-Aware Model Merging [20.429700094073684]
We propose Sparse Complementary Fusion with reverse KL (SCF-RKL), a novel model merging framework that explicitly controls functional interference through sparse, distribution-aware updates.<n>We evaluate SCF-RKL across a wide range of model scales and architectures, covering both reasoning-focused and instruction-tuned models.
arXiv Detail & Related papers (2026-02-12T08:45:42Z)
Implicit bias as a Gauge correction: Theory and Inverse Design [2.9379512315137117]
A central problem in machine learning theory is to characterize how learning dynamics select particular solutions compatible with the training objective.<n>We identify a general mechanism, in terms of an explicit correction of the learning dynamics, for the emergence of implicit biases.<n>We compute the resulting induced bias for a range of dynamics, showing how several well known results fit into a single unified framework.
arXiv Detail & Related papers (2026-01-10T15:33:09Z)
An Integrated Fusion Framework for Ensemble Learning Leveraging Gradient Boosting and Fuzzy Rule-Based Models [59.13182819190547]
Fuzzy rule-based models excel in interpretability and have seen widespread application across diverse fields.<n>They face challenges such as complex design specifications and scalability issues with large datasets.<n>This paper proposes an Integrated Fusion Framework that merges the strengths of both paradigms to enhance model performance and interpretability.
arXiv Detail & Related papers (2025-11-11T10:28:23Z)
NAN: A Training-Free Solution to Coefficient Estimation in Model Merging [61.36020737229637]
We show that the optimal merging weights should scale with the amount of task-specific information encoded in each model.<n>We propose NAN, a simple yet effective method that estimates model merging coefficients via the inverse of parameter norm.<n>NAN is training-free, plug-and-play, and applicable to a wide range of merging strategies.
arXiv Detail & Related papers (2025-05-22T02:46:08Z)
Dynamic Fisher-weighted Model Merging via Bayesian Optimization [37.02810891820468]
Existing merging approaches typically involve scaling the parameters model-wise or integrating parameter importance parameter-wise.<n>We unify these strategies into a more general merging framework, and introduce Dynamic Fisher-weighted Merging (DF-Merge)<n>We show that DF-Merge outperforms strong baselines across models of different sizes and a variety of tasks.
arXiv Detail & Related papers (2025-04-26T18:31:14Z)
Generalized Tensor-based Parameter-Efficient Fine-Tuning via Lie Group Transformations [50.010924231754856]
Adapting pre-trained foundation models for diverse downstream tasks is a core practice in artificial intelligence.<n>To overcome this, parameter-efficient fine-tuning (PEFT) methods like LoRA have emerged and are becoming a growing research focus.<n>We propose a generalization that extends matrix-based PEFT methods to higher-dimensional parameter spaces without compromising their structural properties.
arXiv Detail & Related papers (2025-04-01T14:36:45Z)
Merging Models on the Fly Without Retraining: A Sequential Approach to Scalable Continual Model Merging [75.93960998357812]
Deep model merging represents an emerging research direction that combines multiple fine-tuned models to harness their capabilities across different tasks and domains.<n>Current model merging techniques focus on merging all available models simultaneously, with weight matrices-based methods being the predominant approaches.<n>We propose a training-free projection-based continual merging method that processes models sequentially.
arXiv Detail & Related papers (2025-01-16T13:17:24Z)
Parameter Competition Balancing for Model Merging [13.66727853299506]
PCB-Merging is a training-free technique that adjusts the coefficients of each parameter for effective model merging. PCB-Merging achieves substantial performance enhancements across multiple modalities, domains, model sizes, number of tasks, fine-tuning forms, and large language models.
arXiv Detail & Related papers (2024-10-03T11:17:58Z)
Generalization Bounds of Surrogate Policies for Combinatorial Optimization Problems [53.03951222945921]
We analyze smoothed (perturbed) policies, adding controlled random perturbations to the direction used by the linear oracle.<n>Our main contribution is a generalization bound that decomposes the excess risk into perturbation bias, statistical estimation error, and optimization error.<n>We illustrate the scope of the results on applications such as vehicle scheduling, highlighting how smoothing enables both tractable training and controlled generalization.
arXiv Detail & Related papers (2024-07-24T12:00:30Z)
Majority Kernels: An Approach to Leverage Big Model Dynamics for Efficient Small Model Training [32.154166415680066]
Methods like distillation, compression or quantization help leverage the highly performant large models to induce smaller performant ones. This paper explores the hypothesis that a single training run can simultaneously train a larger model for performance and derive a smaller model for deployment.
arXiv Detail & Related papers (2024-02-07T17:07:41Z)
Manifold Alignment-Based Multi-Fidelity Reduced-Order Modeling Applied to Structural Analysis [0.8808021343665321]
This work presents the application of a recently developed parametric, non-intrusive, and multi-fidelity reduced-order modeling method on high-dimensional displacement and stress fields. Results show that outputs from structural simulations using incompatible grids, or related yet different topologies, are easily combined into a single predictive model. The new multi-fidelity reduced-order model achieves a relatively higher predictive accuracy at a lower computational cost when compared to a single-fidelity model.
arXiv Detail & Related papers (2022-06-14T15:28:21Z)
On the Parameter Combinations That Matter and on Those That do Not [0.0]
We present a data-driven approach to characterizing nonidentifiability of a model's parameters. By employing Diffusion Maps and their extensions, we discover the minimal combinations of parameters required to characterize the dynamic output behavior.
arXiv Detail & Related papers (2021-10-13T13:46:23Z)

This list is automatically generated from the titles and abstracts of the papers in this site.