Related papers: Parameter-Efficient Fine-Tuning of LLMs with Mixture of Space Experts

Parameter-Efficient Fine-Tuning of LLMs with Mixture of Space Experts

URL: http://arxiv.org/abs/2602.14490v1
Date: Mon, 16 Feb 2026 06:07:32 GMT
Title: Parameter-Efficient Fine-Tuning of LLMs with Mixture of Space Experts
Authors: Buze Zhang, Jinkai Tao, Zilang Zeng, Neil He, Ali Maatouk, Menglin Yang, Rex Ying,
Abstract summary: We propose a unified framework that leverages multiple geometric spaces simultaneously to learn curvature-aware representations.<n>We develop MoSLoRA, which extends Low-Rank Adaptation (LoRA) with heterogeneous geometric experts.<n>Our experiments across diverse benchmarks demonstrate that MoSLoRA consistently outperforms strong baselines.
Score: 20.82313207866023
License: http://creativecommons.org/licenses/by-nc-sa/4.0/
Abstract: Large Language Models (LLMs) have achieved remarkable progress, with Parameter-Efficient Fine-Tuning (PEFT) emerging as a key technique for downstream task adaptation. However, existing PEFT methods mainly operate in Euclidean space, fundamentally limiting their capacity to capture complex geometric structures inherent in language data. While alternative geometric spaces, like hyperbolic geometries for hierarchical data and spherical manifolds for circular patterns, offer theoretical advantages, forcing representations into a single manifold type ultimately limits expressiveness, even when curvature parameters are learnable. To address this, we propose Mixture of Space (MoS), a unified framework that leverages multiple geometric spaces simultaneously to learn richer, curvature-aware representations. Building on this scheme, we develop MoSLoRA, which extends Low-Rank Adaptation (LoRA) with heterogeneous geometric experts, enabling models to dynamically select or combine appropriate geometric spaces based on input context. Furthermore, to address the computational overhead of frequent manifold switching, we develop a lightweight routing mechanism. Moreover, we provide empirical insights into how curvature optimization impacts training stability and model performance. Our experiments across diverse benchmarks demonstrate that MoSLoRA consistently outperforms strong baselines, achieving up to 5.6% improvement on MATH500 and 15.9% on MAWPS.

Related papers

Merging Beyond: Streaming LLM Updates via Activation-Guided Rotations [55.047454145941366]
Streaming Merging is an innovative model updating paradigm that conceptualizes merging as an iterative optimization process.<n> ARM is a strategy designed to approximate gradient descent dynamics.<n> ARM requires only early SFT checkpoints and, through iterative merging, surpasses the fully converged SFT model.
arXiv Detail & Related papers (2026-02-03T08:15:57Z)
Learning Geometry: A Framework for Building Adaptive Manifold Models through Metric Optimization [8.201374511929538]
This paper proposes a novel paradigm for machine learning that moves beyond traditional parameter optimization.<n>We optimize the metric tensor field on a manifold with a predefined topology, thereby dynamically shaping the geometric structure of the model space.<n>This work lays a solid foundation for constructing fully dynamic "meta-learners" capable of autonomously evolving their geometry and topology.
arXiv Detail & Related papers (2025-10-30T01:53:32Z)
CAT: Curvature-Adaptive Transformers for Geometry-Aware Learning [0.0]
Curvature-Adaptive Transformer (CAT) learns per-token routing across three geometric attention branches through a lightweight, differentiable gating mechanism.<n>On knowledge graph completion benchmarks, CAT achieves approximately 10% improvements in MRR and Hits@10 over fixed-geometry baselines with minimal overhead.
arXiv Detail & Related papers (2025-10-02T03:26:33Z)
Geometric Operator Learning with Optimal Transport [77.16909146519227]
We propose integrating optimal transport (OT) into operator learning for partial differential equations (PDEs) on complex geometries.<n>For 3D simulations focused on surfaces, our OT-based neural operator embeds the surface geometry into a 2D parameterized latent space.<n> Experiments with Reynolds-averaged Navier-Stokes equations (RANS) on the ShapeNet-Car and DrivAerNet-Car datasets show that our method achieves better accuracy and also reduces computational expenses.
arXiv Detail & Related papers (2025-07-26T21:28:25Z)
GeometryZero: Improving Geometry Solving for LLM with Group Contrastive Policy Optimization [63.107398132743825]
Group Contrastive Policy Optimization (GCPO) is a novel reinforcement learning framework featuring two key innovations.<n>We develop GeometryZero, a family of affordable-size geometric reasoning models that judiciously determine when to employ auxiliary construction.
arXiv Detail & Related papers (2025-06-08T14:18:15Z)
HELM: Hyperbolic Large Language Models via Mixture-of-Curvature Experts [29.365614317331932]
We introduce HELM, a family of HypErbolic Large Language Models.<n>For HELM-MICE, we develop hyperbolic Multi-Head Latent Attention.<n>For both models, we develop essential hyperbolic equivalents of rotary positional encodings and RMS normalization.
arXiv Detail & Related papers (2025-05-30T15:42:42Z)
Generalized Tensor-based Parameter-Efficient Fine-Tuning via Lie Group Transformations [50.010924231754856]
Adapting pre-trained foundation models for diverse downstream tasks is a core practice in artificial intelligence.<n>To overcome this, parameter-efficient fine-tuning (PEFT) methods like LoRA have emerged and are becoming a growing research focus.<n>We propose a generalization that extends matrix-based PEFT methods to higher-dimensional parameter spaces without compromising their structural properties.
arXiv Detail & Related papers (2025-04-01T14:36:45Z)
Riemannian Geometric-based Meta Learning [8.365106891566725]
"Learning to learn" aims to enable models to quickly adapt to new tasks with minimal data.<n>Traditional methods like Model-Agnostic Meta-Learning (MAML) often struggle to capture complex learning dynamics.<n>We propose Stiefel-MAML, which integrates Riemannian geometry by optimizing within the Stiefel manifold.
arXiv Detail & Related papers (2025-03-14T01:34:55Z)
CAMEx: Curvature-aware Merging of Experts [1.5479848902142663]
Existing methods for merging experts during model training and fine-tuning rely on Euclidean geometry.<n>Curvature-aware merging methods require additional information and computational resources to approximate the Fisher Information Matrix.<n>We introduce CAMEx, a novel expert merging protocol that incorporates natural gradients to account for the non-Euclidean curvature of the parameter manifold.
arXiv Detail & Related papers (2025-02-26T04:52:31Z)
RMLR: Extending Multinomial Logistic Regression into General Geometries [64.16104856124029]
Our framework only requires minimal geometric properties, thus exhibiting broad applicability. We develop five families of SPD MLRs under five types of power-deformed metrics. On rotation matrices we propose Lie MLR based on the popular bi-invariant metric.
arXiv Detail & Related papers (2024-09-28T18:38:21Z)
Automatic Parameterization for Aerodynamic Shape Optimization via Deep Geometric Learning [60.69217130006758]
We propose two deep learning models that fully automate shape parameterization for aerodynamic shape optimization. Both models are optimized to parameterize via deep geometric learning to embed human prior knowledge into learned geometric patterns. We perform shape optimization experiments on 2D airfoils and discuss the applicable scenarios for the two models.
arXiv Detail & Related papers (2023-05-03T13:45:40Z)

This list is automatically generated from the titles and abstracts of the papers in this site.