Functionality-Oriented LLM Merging on the Fisher--Rao Manifold
- URL: http://arxiv.org/abs/2603.04972v1
- Date: Thu, 05 Mar 2026 09:08:38 GMT
- Title: Functionality-Oriented LLM Merging on the Fisher--Rao Manifold
- Authors: Jiayu Wang, Zuojun Ye, Wenpeng Yin,
- Abstract summary: Weight-space merging aims to combine multiple fine-tuned LLMs into a single model without retraining.<n>We derive a practical fixed-point algorithm using a lightweight spherical proxy that preserves norms and generalizes directly to multi-expert merging.
- Score: 14.349284217707575
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Weight-space merging aims to combine multiple fine-tuned LLMs into a single model without retraining, yet most existing approaches remain fundamentally parameter-space heuristics. This creates three practical limitations. First, linear averaging, task vectors, and related rules operate on Euclidean coordinates, even though the desired goal is to merge functionality, i.e., predictive behaviors across tasks. Second, when the source checkpoints are farther apart or more heterogeneous, Euclidean blends often trigger representation collapse, manifested as activation variance shrinkage and effective-rank degradation, which sharply degrades accuracy. Third, many geometry-inspired methods are most natural for two-model interpolation and do not extend cleanly to merging N>2 experts with a principled objective. We address these issues by formulating model merging as computing a weighted Karcher mean on the Fisher--Rao manifold, which is locally equivalent to minimizing a KL-based function distance between predictive distributions. We derive a practical fixed-point algorithm using a lightweight spherical proxy that preserves norms and generalizes directly to multi-expert merging. Across various benchmarks and collapse diagnostics, our method remains stable as the number and heterogeneity of merged models increase, consistently outperforming prior baselines.
Related papers
- Model Merging via Multi-Teacher Knowledge Distillation [11.543771846135021]
We introduce a novel flatness-aware PAC-Bayes generalization bound specifically for the model merging setting.<n>We frame model merging as multi-teacher knowledge distillation on scarce, unlabeled data.<n>We formally demonstrate that minimizing the student-teacher Kullback-Leibler divergence directly tightens the upper bound on the merged model's excess risk.
arXiv Detail & Related papers (2025-12-24T17:10:44Z) - Train with Perturbation, Infer after Merging: A Two-Stage Framework for Continual Learning [57.514786046966265]
We propose textbfPerturb-and-Merge (P&M), a novel continual learning framework that integrates model merging into the CL paradigm to mitigate forgetting.<n>Our proposed approach achieves state-of-the-art performance on several continual learning benchmark datasets.
arXiv Detail & Related papers (2025-05-28T14:14:19Z) - NAN: A Training-Free Solution to Coefficient Estimation in Model Merging [61.36020737229637]
We show that the optimal merging weights should scale with the amount of task-specific information encoded in each model.<n>We propose NAN, a simple yet effective method that estimates model merging coefficients via the inverse of parameter norm.<n>NAN is training-free, plug-and-play, and applicable to a wide range of merging strategies.
arXiv Detail & Related papers (2025-05-22T02:46:08Z) - DIMM: Decoupled Multi-hierarchy Kalman Filter for 3D Object Tracking [50.038098341549095]
State estimation is challenging for 3D object tracking with high maneuverability.<n>We propose a novel framework, DIMM, to effectively combine estimates from different motion models in each direction.<n>DIMM significantly improves the tracking accuracy of existing state estimation methods by 31.61%99.23%.
arXiv Detail & Related papers (2025-05-18T10:12:41Z) - Merging Models on the Fly Without Retraining: A Sequential Approach to Scalable Continual Model Merging [75.93960998357812]
Deep model merging represents an emerging research direction that combines multiple fine-tuned models to harness their capabilities across different tasks and domains.<n>Current model merging techniques focus on merging all available models simultaneously, with weight matrices-based methods being the predominant approaches.<n>We propose a training-free projection-based continual merging method that processes models sequentially.
arXiv Detail & Related papers (2025-01-16T13:17:24Z) - OCMG-Net: Neural Oriented Normal Refinement for Unstructured Point Clouds [18.234146052486054]
We present a robust refinement method for estimating oriented normals from unstructured point clouds.
Our framework incorporates sign orientation and data augmentation in the feature space to refine the initial oriented normals.
To address the issue of noise-caused direction inconsistency existing in previous approaches, we introduce a new metric called the Chamfer Normal Distance.
arXiv Detail & Related papers (2024-09-02T09:30:02Z) - MAP: Low-compute Model Merging with Amortized Pareto Fronts via Quadratic Approximation [80.47072100963017]
We introduce a novel and low-compute algorithm, Model Merging with Amortized Pareto Front (MAP)<n>MAP efficiently identifies a set of scaling coefficients for merging multiple models, reflecting the trade-offs involved.<n>We also introduce Bayesian MAP for scenarios with a relatively low number of tasks and Nested MAP for situations with a high number of tasks, further reducing the computational cost of evaluation.
arXiv Detail & Related papers (2024-06-11T17:55:25Z) - Efficient semidefinite bounds for multi-label discrete graphical models [6.226454551201676]
One of the main queries on such models is to identify the SDPWCSP Function on Cost of a Posteri (MAP) Networks.
We consider a traditional dualized constraint approach and a dedicated dedicated SDP/Monteiro style method based on row-by-row updates.
arXiv Detail & Related papers (2021-11-24T13:38:34Z) - Robust Multi-view Registration of Point Sets with Laplacian Mixture
Model [25.865100974015412]
We propose a novel probabilistic generative method to align multiple point sets based on the heavy-tailed Laplacian distribution.
We demonstrate the advantages of our method by comparing it with representative state-of-the-art approaches on benchmark challenging data sets.
arXiv Detail & Related papers (2021-10-26T14:49:09Z) - Hybrid Trilinear and Bilinear Programming for Aligning Partially
Overlapping Point Sets [85.71360365315128]
In many applications, we need algorithms which can align partially overlapping point sets are invariant to the corresponding corresponding RPM algorithm.
We first show that the objective is a cubic bound function. We then utilize the convex envelopes of trilinear and bilinear monomial transformations to derive its lower bound.
We next develop a branch-and-bound (BnB) algorithm which only branches over the transformation variables and runs efficiently.
arXiv Detail & Related papers (2021-01-19T04:24:23Z) - Multi-Objective Matrix Normalization for Fine-grained Visual Recognition [153.49014114484424]
Bilinear pooling achieves great success in fine-grained visual recognition (FGVC)
Recent methods have shown that the matrix power normalization can stabilize the second-order information in bilinear features.
We propose an efficient Multi-Objective Matrix Normalization (MOMN) method that can simultaneously normalize a bilinear representation.
arXiv Detail & Related papers (2020-03-30T08:40:35Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.