Tame Riemannian Stochastic Approximation
- URL: http://arxiv.org/abs/2302.00709v5
- Date: Tue, 12 Aug 2025 10:51:20 GMT
- Title: Tame Riemannian Stochastic Approximation
- Authors: Johannes Aspman, Vyacheslav Kungurtsev, Reza Roohi Seraji,
- Abstract summary: We study the properties of approximation applied to a tame nondifferentiable function subject to constraints defined by a Riemannian manifold.<n>Recent work has shown that this class of functions faithfully model the loss landscape of deep neural network training objectives.
- Score: 2.8166046619628315
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: We study the properties of stochastic approximation applied to a tame nondifferentiable function subject to constraints defined by a Riemannian manifold. The objective landscape of tame functions, arising in o-minimal topology extended to a geometric category when generalized to manifolds, exhibits some structure that enables theoretical guarantees of expected function decrease and asymptotic convergence for generic stochastic sub-gradient descent. Recent work has shown that this class of functions faithfully model the loss landscape of deep neural network training objectives, and the autograd operation used in deep learning packages implements a variant of subgradient descent with the correct properties for convergence. Riemannian optimization uses geometric properties of a constraint set to perform a minimization procedure while enforcing adherence to the the optimization variable lying on a Riemannian manifold. This paper presents the first study of tame optimization on Riemannian manifolds, highlighting the rich geometric structure of the problem and confirming the appropriateness of the canonical "SGD" for such a problem with the analysis and numerical reports of a simple Retracted SGD algorithm.
Related papers
- Efficient Optimization with Orthogonality Constraint: a Randomized Riemannian Submanifold Method [10.239769272138995]
We propose a novel approach to solve problems in machine learning.<n>We introduce two strategies for updating the random submanifold.<n>We show how our approach can be generalized to a wide variety of problems.
arXiv Detail & Related papers (2025-05-18T11:46:44Z) - Gradient-free stochastic optimization for additive models [56.42455605591779]
We address the problem of zero-order optimization from noisy observations for an objective function satisfying the Polyak-Lojasiewicz or the strong convexity condition.
We assume that the objective function has an additive structure and satisfies a higher-order smoothness property, characterized by the H"older family of functions.
arXiv Detail & Related papers (2025-03-03T23:39:08Z) - Inexact subgradient methods for semialgebraic functions [18.293072574300798]
Motivated by the extensive application of approximate gradients in machine learning, we investigate subexact additive methods subject to persistent errors.<n>Our analysis addresses both vanishing and constant step-size regimes.
arXiv Detail & Related papers (2024-04-30T12:47:42Z) - A Mean-Field Analysis of Neural Stochastic Gradient Descent-Ascent for Functional Minimax Optimization [90.87444114491116]
This paper studies minimax optimization problems defined over infinite-dimensional function classes of overparametricized two-layer neural networks.
We address (i) the convergence of the gradient descent-ascent algorithm and (ii) the representation learning of the neural networks.
Results show that the feature representation induced by the neural networks is allowed to deviate from the initial one by the magnitude of $O(alpha-1)$, measured in terms of the Wasserstein distance.
arXiv Detail & Related papers (2024-04-18T16:46:08Z) - On Learning Gaussian Multi-index Models with Gradient Flow [57.170617397894404]
We study gradient flow on the multi-index regression problem for high-dimensional Gaussian data.
We consider a two-timescale algorithm, whereby the low-dimensional link function is learnt with a non-parametric model infinitely faster than the subspace parametrizing the low-rank projection.
arXiv Detail & Related papers (2023-10-30T17:55:28Z) - Last-Iterate Convergence of Adaptive Riemannian Gradient Descent for Equilibrium Computation [52.73824786627612]
This paper establishes new convergence results for textitgeodesic strongly monotone games.<n>Our key result shows that RGD attains last-iterate linear convergence in a textitgeometry-agnostic fashion.<n>Overall, this paper presents the first geometry-agnostic last-iterate convergence analysis for games beyond the Euclidean settings.
arXiv Detail & Related papers (2023-06-29T01:20:44Z) - Kernel-based off-policy estimation without overlap: Instance optimality
beyond semiparametric efficiency [53.90687548731265]
We study optimal procedures for estimating a linear functional based on observational data.
For any convex and symmetric function class $mathcalF$, we derive a non-asymptotic local minimax bound on the mean-squared error.
arXiv Detail & Related papers (2023-01-16T02:57:37Z) - A Unified Analysis of Multi-task Functional Linear Regression Models
with Manifold Constraint and Composite Quadratic Penalty [0.0]
The power of multi-task learning is brought in by imposing additional structures over the slope functions.
We show the composite penalty induces a specific norm, which helps to quantify the manifold curvature.
A unified convergence upper bound is obtained and specifically applied to the reduced-rank model and the graph Laplacian regularized model.
arXiv Detail & Related papers (2022-11-09T13:32:23Z) - Perturbation Analysis of Neural Collapse [24.94449183555951]
Training deep neural networks for classification often includes minimizing the training loss beyond the zero training error point.
Recent works analyze this behavior via idealized unconstrained features models where all the minimizers exhibit exact collapse.
We propose a richer model that can capture this phenomenon by forcing the features to stay in the vicinity of a predefined features matrix.
arXiv Detail & Related papers (2022-10-29T17:46:03Z) - Learning Globally Smooth Functions on Manifolds [94.22412028413102]
Learning smooth functions is generally challenging, except in simple cases such as learning linear or kernel models.
This work proposes to overcome these obstacles by combining techniques from semi-infinite constrained learning and manifold regularization.
We prove that, under mild conditions, this method estimates the Lipschitz constant of the solution, learning a globally smooth solution as a byproduct.
arXiv Detail & Related papers (2022-10-01T15:45:35Z) - Minimal Neural Atlas: Parameterizing Complex Surfaces with Minimal
Charts and Distortion [71.52576837870166]
We present Minimal Neural Atlas, a novel atlas-based explicit neural surface representation.
At its core is a fully learnable parametric domain, given by an implicit probabilistic occupancy field defined on an open square of the parametric space.
Our reconstructions are more accurate in terms of the overall geometry, due to the separation of concerns on topology and geometry.
arXiv Detail & Related papers (2022-07-29T16:55:06Z) - Riemannian Stochastic Gradient Method for Nested Composition Optimization [0.0]
This work considers optimization of composition of functions in a nested form over Riemannian where each function contains an expectation.
This type of problems is gaining popularity in applications such as policy evaluation in reinforcement learning or model customization in metalearning.
arXiv Detail & Related papers (2022-07-19T15:58:27Z) - Stochastic Langevin Differential Inclusions with Applications to Machine Learning [5.274477003588407]
We show some foundational results regarding the flow and properties of Langevin-type Differential Inclusions.
In particular, we show strong existence of the solution, as well as an canonical- minimization of the free-energy functional.
arXiv Detail & Related papers (2022-06-23T08:29:17Z) - The Dynamics of Riemannian Robbins-Monro Algorithms [101.29301565229265]
We propose a family of Riemannian algorithms generalizing and extending the seminal approximation framework of Robbins and Monro.
Compared to their Euclidean counterparts, Riemannian algorithms are much less understood due to lack of a global linear structure on the manifold.
We provide a general template of almost sure convergence results that mirrors and extends the existing theory for Euclidean Robbins-Monro schemes.
arXiv Detail & Related papers (2022-06-14T12:30:11Z) - First-Order Algorithms for Min-Max Optimization in Geodesic Metric
Spaces [93.35384756718868]
min-max algorithms have been analyzed in the Euclidean setting.
We prove that the extraiteient (RCEG) method corrected lastrate convergence at a linear rate.
arXiv Detail & Related papers (2022-06-04T18:53:44Z) - Towards Modeling and Resolving Singular Parameter Spaces using
Stratifolds [18.60761407945024]
In learning dynamics, singularities can act as attractors on the learning trajectory and, therefore, negatively influence the convergence speed of models.
We propose a general approach to circumvent the problem arising from singularities by using stratifolds.
We empirically show that using (natural) gradient descent on the smooth manifold approximation instead of the singular space allows us to avoid the attractor behavior and therefore improve the convergence speed in learning.
arXiv Detail & Related papers (2021-12-07T14:42:45Z) - Automatic differentiation for Riemannian optimization on low-rank matrix
and tensor-train manifolds [71.94111815357064]
In scientific computing and machine learning applications, matrices and more general multidimensional arrays (tensors) can often be approximated with the help of low-rank decompositions.
One of the popular tools for finding the low-rank approximations is to use the Riemannian optimization.
arXiv Detail & Related papers (2021-03-27T19:56:00Z) - Learning Sub-Patterns in Piecewise Continuous Functions [4.18804572788063]
Most gradient descent algorithms can optimize neural networks that are sub-differentiable in their parameters.
This paper focuses on the case where the discontinuities arise from distinct sub-patterns.
We propose a new discontinuous deep neural network model trainable via a decoupled two-step procedure.
arXiv Detail & Related papers (2020-10-29T13:44:13Z) - On the minmax regret for statistical manifolds: the role of curvature [68.8204255655161]
Two-part codes and the minimum description length have been successful in delivering procedures to single out the best models.
We derive a sharper expression than the standard one given by the complexity, where the scalar curvature of the Fisher information metric plays a dominant role.
arXiv Detail & Related papers (2020-07-06T17:28:19Z) - Stochastic Zeroth-order Riemannian Derivative Estimation and
Optimization [15.78743548731191]
We propose an oracle version of the Gaussian smoothing function to overcome the difficulty of non-linearity of manifold non-linearity.
We demonstrate the applicability of our algorithms by results and real-world applications on black-box stiffness control for robotics and black-box attacks to neural networks.
arXiv Detail & Related papers (2020-03-25T06:58:19Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.