Related papers: Geometry-Preserving Aggregation for Mixture-of-Experts Embedding Models

Geometry-Preserving Aggregation for Mixture-of-Experts Embedding Models

URL: http://arxiv.org/abs/2602.14039v1
Date: Sun, 15 Feb 2026 08:00:56 GMT
Title: Geometry-Preserving Aggregation for Mixture-of-Experts Embedding Models
Authors: Sajjad Kachuee, Mohammad Sharifkhani,
Abstract summary: Mixture-of-Experts (MoE) embedding models combine expert outputs using weighted linear summation, implicitly assuming a linear subspace structure in the embedding space.<n> Geometric analysis of a modern MoE embedding model reveals that expert outputs lie on a shared hyperspherical manifold characterized by tightly concentrated norms and substantial angular separation.<n>Spherical Barycentric Aggregation (SBA) is introduced as a geometry-preserving aggregation operator that separates radial and angular components to maintain hyperspherical structure while remaining fully compatible with existing routing mechanisms.
Score: 4.125187280299246
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Mixture-of-Experts (MoE) embedding models combine expert outputs using weighted linear summation, implicitly assuming a linear subspace structure in the embedding space. This assumption is shown to be inconsistent with the geometry of expert representations. Geometric analysis of a modern MoE embedding model reveals that expert outputs lie on a shared hyperspherical manifold characterized by tightly concentrated norms and substantial angular separation. Under this geometry, linear aggregation induces inward collapse toward the manifold interior, distorting vector magnitude and direction and reducing embedding comparability. To address this inconsistency, Spherical Barycentric Aggregation (SBA) is introduced as a geometry-preserving aggregation operator that separates radial and angular components to maintain hyperspherical structure while remaining fully compatible with existing routing mechanisms. Experiments on selected tasks from the Massive Text Embedding Benchmark (MTEB), including semantic similarity, clustering, and duplicate question detection, demonstrate consistent performance improvements with identical training cost and full stability. Additional geometric analyses confirm that SBA prevents aggregation-induced collapse and preserves hyperspherical consistency, highlighting the importance of geometry-aware aggregation in MoE embedding architectures.

Related papers

Rectifying Geometry-Induced Similarity Distortions for Real-World Aerial-Ground Person Re-Identification [4.039576422478934]
Aerial-ground person re-identification (AG-ReID) is fundamentally challenged by extreme viewpoint and distance discrepancies.<n>Existing methods rely on geometry-aware feature learning or appearance-conditioned prompting.<n>We introduce Geometry-Induced Query-Key Transformation (GIQT), a lightweight low-rank module that rectifies the similarity space by conditioning query-key interactions on camera geometry.
arXiv Detail & Related papers (2026-01-29T08:41:42Z)
The Geometry of Machine Learning Models [0.0]
This paper presents a framework for analyzing machine learning models through the geometry of their induced partitions.<n>For neural networks, we introduce a differential forms approach that tracks geometric structure through layers via pullback operations.<n>While focused on mathematical foundations, this geometric perspective offers new approaches to model interpretation, regularization, and diagnostic tools for understanding learning dynamics.
arXiv Detail & Related papers (2025-08-04T05:45:52Z)
Enforcing Latent Euclidean Geometry in Single-Cell VAEs for Manifold Interpolation [79.27003481818413]
We introduce FlatVI, a training framework that regularises the latent manifold of discrete-likelihood variational autoencoders towards Euclidean geometry.<n>By encouraging straight lines in the latent space to approximate geodesics on the decoded single-cell manifold, FlatVI enhances compatibility with downstream approaches.
arXiv Detail & Related papers (2025-07-15T23:08:14Z)
Harmonizing Geometry and Uncertainty: Diffusion with Hyperspheres [43.20744744438439]
We introduce HyperSphereDiff to align hyperspherical structures with directional noise, preserving class geometry and effectively capturing angular uncertainty.<n>We evaluate our framework on four object datasets and two face datasets, showing that incorporating angular uncertainty better preserves the underlying hyperspherical manifold.
arXiv Detail & Related papers (2025-06-12T11:10:52Z)
Geometry-Editable and Appearance-Preserving Object Compositon [67.98806888489385]
General object composition (GOC) aims to seamlessly integrate a target object into a background scene with desired geometric properties.<n>Recent approaches derive semantic embeddings and integrate them into advanced diffusion models to enable geometry-editable generation.<n>We introduce a Disentangled Geometry-editable and Appearance-preserving Diffusion model that first leverages semantic embeddings to implicitly capture desired geometric transformations.
arXiv Detail & Related papers (2025-05-27T09:05:28Z)
Cramer-Rao Bounds for Laplacian Matrix Estimation [56.1214184671173]
We derive closed-form matrix expressions for the Cramer-Rao Bound (CRB) specifically tailored to Laplacian matrix estimation.<n>We demonstrate the use of CRBs in three representative applications: (i) topology identification in power systems, (ii) graph filter identification in diffused models, and (iii) precision matrix estimation in Gaussian Markov random fields under Laplacian constraints.
arXiv Detail & Related papers (2025-04-06T18:28:31Z)
Geometric Neural Diffusion Processes [55.891428654434634]
We extend the framework of diffusion models to incorporate a series of geometric priors in infinite-dimension modelling. We show that with these conditions, the generative functional model admits the same symmetry.
arXiv Detail & Related papers (2023-07-11T16:51:38Z)
Shape And Structure Preserving Differential Privacy [70.08490462870144]
We show how the gradient of the squared distance function offers better control over sensitivity than the Laplace mechanism. We also show how using the gradient of the squared distance function offers better control over sensitivity than the Laplace mechanism.
arXiv Detail & Related papers (2022-09-21T18:14:38Z)
Manifold Alignment-Based Multi-Fidelity Reduced-Order Modeling Applied to Structural Analysis [0.8808021343665321]
This work presents the application of a recently developed parametric, non-intrusive, and multi-fidelity reduced-order modeling method on high-dimensional displacement and stress fields. Results show that outputs from structural simulations using incompatible grids, or related yet different topologies, are easily combined into a single predictive model. The new multi-fidelity reduced-order model achieves a relatively higher predictive accuracy at a lower computational cost when compared to a single-fidelity model.
arXiv Detail & Related papers (2022-06-14T15:28:21Z)
A Unifying and Canonical Description of Measure-Preserving Diffusions [60.59592461429012]
A complete recipe of measure-preserving diffusions in Euclidean space was recently derived unifying several MCMC algorithms into a single framework. We develop a geometric theory that improves and generalises this construction to any manifold.
arXiv Detail & Related papers (2021-05-06T17:36:55Z)

This list is automatically generated from the titles and abstracts of the papers in this site.