Orthogonal Model Merging
- URL: http://arxiv.org/abs/2602.05943v1
- Date: Thu, 05 Feb 2026 17:57:14 GMT
- Title: Orthogonal Model Merging
- Authors: Sihan Yang, Kexuan Shi, Weiyang Liu,
- Abstract summary: Merging finetuned Large Language Models (LLMs) has become increasingly important for integrating diverse capabilities into a single unified model.<n>We propose Orthogonal Model Merging (OrthoMerge) to preserve the geometric structure of the model's weights.
- Score: 21.902153344111223
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Merging finetuned Large Language Models (LLMs) has become increasingly important for integrating diverse capabilities into a single unified model. However, prevailing model merging methods rely on linear arithmetic in Euclidean space, which often destroys the intrinsic geometric properties of pretrained weights, such as hyperspherical energy. To address this, we propose Orthogonal Model Merging (OrthoMerge), a method that performs merging operations on the Riemannian manifold formed by the orthogonal group to preserve the geometric structure of the model's weights. By mapping task-specific orthogonal matrices learned by Orthogonal Finetuning (OFT) to the Lie algebra, OrthoMerge enables a principled yet efficient integration that takes into account both the direction and intensity of adaptations. In addition to directly leveraging orthogonal matrices obtained by OFT, we further extend this approach to general models finetuned with non-OFT methods (i.e., low-rank finetuning, full finetuning) via an Orthogonal-Residual Decoupling strategy. This technique extracts the orthogonal components of expert models by solving the orthogonal Procrustes problem, which are then merged on the manifold of the orthogonal group, while the remaining linear residuals are processed through standard additive merging. Extensive empirical results demonstrate the effectiveness of OrthoMerge in mitigating catastrophic forgetting and maintaining model performance across diverse tasks.
Related papers
- Rethinking Diffusion Models with Symmetries through Canonicalization with Applications to Molecular Graph Generation [56.361076943802594]
CanonFlow achieves state-of-the-art performance on the challenging GEOM-DRUG dataset, and the advantage remains large in few-step generation.
arXiv Detail & Related papers (2026-02-16T18:58:55Z) - From Coefficients to Directions: Rethinking Model Merging with Directional Alignment [66.99062575537555]
We introduce a unified geometric framework, emphMerging with Directional Alignment (method), which aligns directional structures consistently in both the parameter and feature spaces.<n>Our analysis shows that directional alignment improves structural coherence, and extensive experiments across benchmarks, model scales, and task configurations further validate the effectiveness of our approach.
arXiv Detail & Related papers (2025-11-29T08:40:58Z) - Learning Geometry: A Framework for Building Adaptive Manifold Models through Metric Optimization [8.201374511929538]
This paper proposes a novel paradigm for machine learning that moves beyond traditional parameter optimization.<n>We optimize the metric tensor field on a manifold with a predefined topology, thereby dynamically shaping the geometric structure of the model space.<n>This work lays a solid foundation for constructing fully dynamic "meta-learners" capable of autonomously evolving their geometry and topology.
arXiv Detail & Related papers (2025-10-30T01:53:32Z) - Relaxed Total Generalized Variation Regularized Piecewise Smooth Mumford-Shah Model for Triangulated Surface Segmentation [0.7837881800517112]
We propose a novel piecewise smooth MS mesh segmentation model by utilizing the relaxed total generalized variation regularization (rTGV)<n>The new model assumes that the feature function of a mesh can be approximated by the sum of piecewise constant function and asmooth function.<n>The newly introduced method is effective in segmenting meshes with irregular structures and getting the better boundaries rather than the shortest boundaries.
arXiv Detail & Related papers (2025-07-25T14:00:32Z) - Generalized Linear Mode Connectivity for Transformers [87.32299363530996]
A striking phenomenon is linear mode connectivity (LMC), where independently trained models can be connected by low- or zero-loss paths.<n>Prior work has predominantly focused on neuron re-ordering through permutations, but such approaches are limited in scope.<n>We introduce a unified framework that captures four symmetry classes: permutations, semi-permutations, transformations, and general invertible maps.<n>This generalization enables, for the first time, the discovery of low- and zero-barrier linear paths between independently trained Vision Transformers and GPT-2 models.
arXiv Detail & Related papers (2025-06-28T01:46:36Z) - Efficient Diffusion Models for Symmetric Manifolds [25.99200001269046]
We introduce a framework for designing efficient diffusion models for $d$-dimensional symmetric-space.<n>Mandela symmetries ensure the diffusion satisfies an "average-case" Lipschitz condition.<n>Our model outperforms prior methods in training speed and improves sample quality on synthetic datasets.
arXiv Detail & Related papers (2025-05-27T18:12:29Z) - ADMM-MM Algorithm for General Tensor Decomposition [7.0326155922512275]
The proposed algorithm supports three basic loss functions ($ell$-loss, $ell$-loss and KL divergence) and various low-rank tensor decomposition models (CP, Tucker, TT, and TR decompositions)
We show that wide-range applications can be solved by the proposed algorithm, and can be easily extended to any established tensor decomposition models in a plug-and-play manner.
arXiv Detail & Related papers (2023-12-19T00:17:34Z) - Neural Lattice Reduction: A Self-Supervised Geometric Deep Learning Approach [12.679411410749521]
We show that it is possible to parametrize the algorithm space for lattice reduction problem with neural networks and find an algorithm without supervised data.<n>We design a deep neural model outputting factorized unimodular matrices and train it in a self-supervised manner by penalizing non-orthogonal lattice bases.<n>We show that this approach yields an algorithm with comparable complexity and performance to the Lenstra-Lenstra-Lov'asz algorithm on a set of benchmarks.
arXiv Detail & Related papers (2023-11-14T13:54:35Z) - Geometric Neural Diffusion Processes [55.891428654434634]
We extend the framework of diffusion models to incorporate a series of geometric priors in infinite-dimension modelling.
We show that with these conditions, the generative functional model admits the same symmetry.
arXiv Detail & Related papers (2023-07-11T16:51:38Z) - Learning Graphical Factor Models with Riemannian Optimization [70.13748170371889]
This paper proposes a flexible algorithmic framework for graph learning under low-rank structural constraints.
The problem is expressed as penalized maximum likelihood estimation of an elliptical distribution.
We leverage geometries of positive definite matrices and positive semi-definite matrices of fixed rank that are well suited to elliptical models.
arXiv Detail & Related papers (2022-10-21T13:19:45Z) - Optimization on manifolds: A symplectic approach [127.54402681305629]
We propose a dissipative extension of Dirac's theory of constrained Hamiltonian systems as a general framework for solving optimization problems.
Our class of (accelerated) algorithms are not only simple and efficient but also applicable to a broad range of contexts.
arXiv Detail & Related papers (2021-07-23T13:43:34Z) - Improving Metric Dimensionality Reduction with Distributed Topology [68.8204255655161]
DIPOLE is a dimensionality-reduction post-processing step that corrects an initial embedding by minimizing a loss functional with both a local, metric term and a global, topological term.
We observe that DIPOLE outperforms popular methods like UMAP, t-SNE, and Isomap on a number of popular datasets.
arXiv Detail & Related papers (2021-06-14T17:19:44Z) - Jointly Modeling and Clustering Tensors in High Dimensions [6.072664839782975]
We consider the problem of jointly benchmarking and clustering of tensors.
We propose an efficient high-maximization algorithm that converges geometrically to a neighborhood that is within statistical precision.
arXiv Detail & Related papers (2021-04-15T21:06:16Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.