Riemannian Optimization in Modular Systems
- URL: http://arxiv.org/abs/2603.03610v1
- Date: Wed, 04 Mar 2026 00:47:29 GMT
- Title: Riemannian Optimization in Modular Systems
- Authors: Christian Pehle, Jean-Jacques Slotine,
- Abstract summary: The backpropagation algorithm is instrumental in the success of neural networks.<n>Despite its empirical success, a strong theoretical understanding of it is lacking.<n>We develop a framework of composable Riemannian modules'' whose convergence properties can be quantified.
- Score: 0.006486143522483092
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Understanding how systems built out of modular components can be jointly optimized is an important problem in biology, engineering, and machine learning. The backpropagation algorithm is one such solution and has been instrumental in the success of neural networks. Despite its empirical success, a strong theoretical understanding of it is lacking. Here, we combine tools from Riemannian geometry, optimal control theory, and theoretical physics to advance this understanding. We make three key contributions: First, we revisit the derivation of backpropagation as a constrained optimization problem and combine it with the insight that Riemannian gradient descent trajectories can be understood as the minimum of an action. Second, we introduce a recursively defined layerwise Riemannian metric that exploits the modular structure of neural networks and can be efficiently computed using the Woodbury matrix identity, avoiding the $O(n^3)$ cost of full metric inversion. Third, we develop a framework of composable ``Riemannian modules'' whose convergence properties can be quantified using nonlinear contraction theory, providing algorithmic stability guarantees of order $O(κ^2 L/(ξμ\sqrt{n}))$ where $κ$ and $L$ are Lipschitz constants, $μ$ is the mass matrix scale, and $ξ$ bounds the condition number. Our layerwise metric approach provides a practical alternative to natural gradient descent. While we focus here on studying neural networks, our approach more generally applies to the study of systems made of modules that are optimized over time, as it occurs in biology during both evolution and development.
Related papers
- Constructive Lyapunov Functions via Topology-Preserving Neural Networks [0.0]
ONN achieves order-optimal performance on convergence rate, edge efficiency, and computational complexity.<n> Empirical validation on 3M-node semantic networks demonstrates 99.75% improvement over baseline methods.<n>ORTSF integration into transformers achieves 14.7% perplexity reduction and 2.3 faster convergence.
arXiv Detail & Related papers (2025-10-10T17:46:52Z) - StelLA: Subspace Learning in Low-rank Adaptation using Stiefel Manifold [51.93627542334909]
Low-rank adaptation (LoRA) has been widely adopted as a parameter-efficient technique for fine-tuning large-scale pre-trained models.<n>We propose a geometry-aware extension of LoRA that uses a three-factor decomposition $U!SVtop$.
arXiv Detail & Related papers (2025-10-02T11:59:13Z) - Robust Training and Verification of Implicit Neural Networks: A
Non-Euclidean Contractive Approach [64.23331120621118]
This paper proposes a theoretical and computational framework for training and robustness verification of implicit neural networks.
We introduce a related embedded network and show that the embedded network can be used to provide an $ell_infty$-norm box over-approximation of the reachable sets of the original network.
We apply our algorithms to train implicit neural networks on the MNIST dataset and compare the robustness of our models with the models trained via existing approaches in the literature.
arXiv Detail & Related papers (2022-08-08T03:13:24Z) - On a class of geodesically convex optimization problems solved via
Euclidean MM methods [50.428784381385164]
We show how a difference of Euclidean convexization functions can be written as a difference of different types of problems in statistics and machine learning.
Ultimately, we helps the broader broader the broader the broader the broader the work.
arXiv Detail & Related papers (2022-06-22T23:57:40Z) - PDE-Based Optimal Strategy for Unconstrained Online Learning [40.61498562988079]
We present a framework that generates time-varying potential functions by solving a Partial Differential Equation (PDE)
Our framework recovers some classical potentials, and more importantly provides a systematic approach to design new ones.
This is the first parameter-free algorithm with optimal leading constant.
arXiv Detail & Related papers (2022-01-19T22:21:21Z) - Fractal Structure and Generalization Properties of Stochastic
Optimization Algorithms [71.62575565990502]
We prove that the generalization error of an optimization algorithm can be bounded on the complexity' of the fractal structure that underlies its generalization measure.
We further specialize our results to specific problems (e.g., linear/logistic regression, one hidden/layered neural networks) and algorithms.
arXiv Detail & Related papers (2021-06-09T08:05:36Z) - GT-STORM: Taming Sample, Communication, and Memory Complexities in
Decentralized Non-Convex Learning [11.129095449532283]
Decentralized-1/2 non robustness optimization has received increasing attention in recent years in machine learning.
Three fundamental challenges in designing decentralized optimization algorithms are how to reduce their sample costs, communication, and memory complexities.
arXiv Detail & Related papers (2021-05-04T00:44:48Z) - Semi-discrete optimization through semi-discrete optimal transport: a
framework for neural architecture search [0.0]
We introduce a theoretical framework for semi-discrete optimization using ideas from optimal transport.
Our primary motivation is in the field of deep learning, and specifically in the task of neural architecture search.
In a companion paper we show that algorithms inspired by our framework are competitive with contemporaneous methods.
arXiv Detail & Related papers (2020-06-26T21:44:35Z) - Predictive Coding Approximates Backprop along Arbitrary Computation
Graphs [68.8204255655161]
We develop a strategy to translate core machine learning architectures into their predictive coding equivalents.
Our models perform equivalently to backprop on challenging machine learning benchmarks.
Our method raises the potential that standard machine learning algorithms could in principle be directly implemented in neural circuitry.
arXiv Detail & Related papers (2020-06-07T15:35:47Z) - Stochastic Flows and Geometric Optimization on the Orthogonal Group [52.50121190744979]
We present a new class of geometrically-driven optimization algorithms on the orthogonal group $O(d)$.
We show that our methods can be applied in various fields of machine learning including deep, convolutional and recurrent neural networks, reinforcement learning, flows and metric learning.
arXiv Detail & Related papers (2020-03-30T15:37:50Z) - Backward Feature Correction: How Deep Learning Performs Deep
(Hierarchical) Learning [66.05472746340142]
This paper analyzes how multi-layer neural networks can perform hierarchical learning _efficiently_ and _automatically_ by SGD on the training objective.
We establish a new principle called "backward feature correction", where the errors in the lower-level features can be automatically corrected when training together with the higher-level layers.
arXiv Detail & Related papers (2020-01-13T17:28:29Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.