Sparsity is Combinatorial Depth: Quantifying MoE Expressivity via Tropical Geometry
- URL: http://arxiv.org/abs/2602.03204v1
- Date: Tue, 03 Feb 2026 07:17:38 GMT
- Title: Sparsity is Combinatorial Depth: Quantifying MoE Expressivity via Tropical Geometry
- Authors: Ye Su, Huayi Tang, Zixuan Gong, Yong Liu,
- Abstract summary: We present the first analysis of MoE through the lens of tropical geometry.<n>Our framework unifies the discrete geometry of the Hyperx with the continuous geometry of neural functions.
- Score: 21.251058776601553
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: While Mixture-of-Experts (MoE) architectures define the state-of-the-art, their theoretical success is often attributed to heuristic efficiency rather than geometric expressivity. In this work, we present the first analysis of MoE through the lens of tropical geometry, establishing that the Top-$k$ routing mechanism is algebraically isomorphic to the $k$-th elementary symmetric tropical polynomial. This isomorphism partitions the input space into the Normal Fan of a Hypersimplex, revealing that \textbf{sparsity is combinatorial depth} which scales geometric capacity by the binomial coefficient $\binom{N}{k}$. Moving beyond ambient bounds, we introduce the concept of \textit{Effective Capacity} under the Manifold Hypothesis. We prove that while dense networks suffer from capacity collapse on low-dimensional data, MoE architectures exhibit \textit{Combinatorial Resilience}, maintaining high expressivity via the transversality of routing cones. In this study, our framework unifies the discrete geometry of the Hypersimplex with the continuous geometry of neural functions, offering a rigorous theoretical justification for the topological supremacy of conditional computation.
Related papers
- Riemannian Langevin Dynamics: Strong Convergence of Geometric Euler-Maruyama Scheme [51.56484100374058]
Low-dimensional structure in real-world data plays an important role in the success of generative models.<n>We prove convergence theory of numerical schemes for manifold-valued differential equations.
arXiv Detail & Related papers (2026-03-04T01:29:35Z) - SKGE: Spherical Knowledge Graph Embedding with Geometric Regularization [0.0]
We propose Spherical Knowledge Graph Embedding (SKGE), a model that constrains entity representations to a compact manifold: a hypersphere.<n>We demonstrate that SKGE consistently and significantly outperforms its strong Euclidean counterpart, TransE.
arXiv Detail & Related papers (2025-11-04T10:40:46Z) - The Neural Differential Manifold: An Architecture with Explicit Geometric Structure [8.201374511929538]
This paper introduces the Neural Differential Manifold (NDM), a novel neural network architecture that explicitly incorporates geometric structure into its fundamental design.<n>We analyze the theoretical advantages of this approach, including its potential for more efficient optimization, enhanced continual learning, and applications in scientific discovery and controllable generative modeling.
arXiv Detail & Related papers (2025-10-29T02:24:27Z) - From Universal Approximation Theorem to Tropical Geometry of Multi-Layer Perceptrons [0.0]
We revisit the Universal Approximation Theorem through the lens of the tropical geometry of neural networks.<n>We introduce constructive, geometry-aware architecture for sigmoidal multi-layer perceptrons.
arXiv Detail & Related papers (2025-10-16T13:15:39Z) - Contextuality, Holonomy and Discrete Fiber Bundles in Group-Valued Boltzmann Machines [0.0]
We propose a geometric extension of restricted Boltzmann machines (RBMs) by allowing weights to take values in abstract groups.<n>This generalization enables the modeling of complex relational structures, including projective transformations, spinor dynamics, and functional symmetries.<n>A central contribution of this work is the introduction of a emphcontextuality index based on group-valued holonomies computed along cycles in the RBM graph.<n>This index quantifies the global inconsistency or "curvature" induced by local weights, generalizing classical notions of coherence, consistency, and geometric flatness
arXiv Detail & Related papers (2025-09-05T15:07:54Z) - Learning Latent Graph Geometry via Fixed-Point Schrödinger-Type Activation: A Theoretical Study [1.1745324895296467]
We develop a unified theoretical framework for neural architectures with internal representations evolving as stationary states of dissipative Schr"odinger-type dynamics on learned latent graphs.<n>We prove existence, uniqueness, and smooth dependence of equilibria, and show that the dynamics are equivalent under the Bloch map to norm-preserving Landau--Lifshitz flows.<n>The resulting model class provides a compact, geometrically interpretable, and analytically tractable foundation for learning latent graph geometry via fixed-point Schr"odinger-type activations.
arXiv Detail & Related papers (2025-07-27T00:35:15Z) - Theorem-Validated Reverse Chain-of-Thought Problem Generation for Geometric Reasoning [53.13514542825493]
We introduce a two-stage Theorem-d Reverse Chain-of-Thought Reasoning Synthesis (TRCoT) framework.<n>The first stage, TR-Engine, synthesizes theorem-grounded geometric diagrams with structured descriptions and properties.<n>The second stage, TR-Reasoner, employs reverse reasoning to iteratively refine question-answer pairs by cross-validating geometric properties and description fragments.
arXiv Detail & Related papers (2024-10-23T13:58:39Z) - Geometric Neural Diffusion Processes [55.891428654434634]
We extend the framework of diffusion models to incorporate a series of geometric priors in infinite-dimension modelling.
We show that with these conditions, the generative functional model admits the same symmetry.
arXiv Detail & Related papers (2023-07-11T16:51:38Z) - Geometry Interaction Knowledge Graph Embeddings [153.69745042757066]
We propose Geometry Interaction knowledge graph Embeddings (GIE), which learns spatial structures interactively between the Euclidean, hyperbolic and hyperspherical spaces.
Our proposed GIE can capture a richer set of relational information, model key inference patterns, and enable expressive semantic matching across entities.
arXiv Detail & Related papers (2022-06-24T08:33:43Z) - A singular Riemannian geometry approach to Deep Neural Networks I.
Theoretical foundations [77.86290991564829]
Deep Neural Networks are widely used for solving complex problems in several scientific areas, such as speech recognition, machine translation, image analysis.
We study a particular sequence of maps between manifold, with the last manifold of the sequence equipped with a Riemannian metric.
We investigate the theoretical properties of the maps of such sequence, eventually we focus on the case of maps between implementing neural networks of practical interest.
arXiv Detail & Related papers (2021-12-17T11:43:30Z) - A Unifying and Canonical Description of Measure-Preserving Diffusions [60.59592461429012]
A complete recipe of measure-preserving diffusions in Euclidean space was recently derived unifying several MCMC algorithms into a single framework.
We develop a geometric theory that improves and generalises this construction to any manifold.
arXiv Detail & Related papers (2021-05-06T17:36:55Z) - Hyperbolic Neural Networks++ [66.16106727715061]
We generalize the fundamental components of neural networks in a single hyperbolic geometry model, namely, the Poincar'e ball model.
Experiments show the superior parameter efficiency of our methods compared to conventional hyperbolic components, and stability and outperformance over their Euclidean counterparts.
arXiv Detail & Related papers (2020-06-15T08:23:20Z) - On Infinite-Width Hypernetworks [101.03630454105621]
We show that hypernetworks do not guarantee to a global minima under descent.
We identify the functional priors of these architectures by deriving their corresponding GP and NTK kernels.
As part of this study, we make a mathematical contribution by deriving tight bounds on high order Taylor terms of standard fully connected ReLU networks.
arXiv Detail & Related papers (2020-03-27T00:50:29Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.