Riemannian Natural Gradient Methods
- URL: http://arxiv.org/abs/2207.07287v1
- Date: Fri, 15 Jul 2022 04:33:10 GMT
- Title: Riemannian Natural Gradient Methods
- Authors: Jiang Hu, Ruicheng Ao, Anthony Man-Cho So, Minghan Yang, and Zaiwen
Wen
- Abstract summary: We introduce the notion of Fisher information matrix in the manifold setting, which can be viewed as a natural extension of the natural gradient method.
We establish the almost-sure global convergence of our proposed method under standard assumptions.
Numerical experiments on applications arising from machine learning demonstrate the advantages of the proposed method over state-of-the-art ones.
- Score: 21.14740680011498
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: This paper studies large-scale optimization problems on Riemannian manifolds
whose objective function is a finite sum of negative log-probability losses.
Such problems arise in various machine learning and signal processing
applications. By introducing the notion of Fisher information matrix in the
manifold setting, we propose a novel Riemannian natural gradient method, which
can be viewed as a natural extension of the natural gradient method from the
Euclidean setting to the manifold setting. We establish the almost-sure global
convergence of our proposed method under standard assumptions. Moreover, we
show that if the loss function satisfies certain convexity and smoothness
conditions and the input-output map satisfies a Riemannian Jacobian stability
condition, then our proposed method enjoys a local linear -- or, under the
Lipschitz continuity of the Riemannian Jacobian of the input-output map, even
quadratic -- rate of convergence. We then prove that the Riemannian Jacobian
stability condition will be satisfied by a two-layer fully connected neural
network with batch normalization with high probability, provided that the width
of the network is sufficiently large. This demonstrates the practical relevance
of our convergence rate result. Numerical experiments on applications arising
from machine learning demonstrate the advantages of the proposed method over
state-of-the-art ones.
Related papers
- Trust-Region Sequential Quadratic Programming for Stochastic Optimization with Random Models [57.52124921268249]
We propose a Trust Sequential Quadratic Programming method to find both first and second-order stationary points.
To converge to first-order stationary points, our method computes a gradient step in each iteration defined by minimizing a approximation of the objective subject.
To converge to second-order stationary points, our method additionally computes an eigen step to explore the negative curvature the reduced Hessian matrix.
arXiv Detail & Related papers (2024-09-24T04:39:47Z) - Riemannian Federated Learning via Averaging Gradient Stream [8.75592575216789]
This paper develops and analyzes an efficient Federated Averaging Gradient Stream (RFedAGS) algorithm.
Numerical simulations conducted on synthetic and real-world data demonstrate the performance of the proposed RFedAGS.
arXiv Detail & Related papers (2024-09-11T12:28:42Z) - A Unified Theory of Stochastic Proximal Point Methods without Smoothness [52.30944052987393]
Proximal point methods have attracted considerable interest owing to their numerical stability and robustness against imperfect tuning.
This paper presents a comprehensive analysis of a broad range of variations of the proximal point method (SPPM)
arXiv Detail & Related papers (2024-05-24T21:09:19Z) - Riemannian stochastic optimization methods avoid strict saddle points [68.80251170757647]
We show that policies under study avoid strict saddle points / submanifolds with probability 1.
This result provides an important sanity check as it shows that, almost always, the limit state of an algorithm can only be a local minimizer.
arXiv Detail & Related papers (2023-11-04T11:12:24Z) - Curvature-Independent Last-Iterate Convergence for Games on Riemannian
Manifolds [77.4346324549323]
We show that a step size agnostic to the curvature of the manifold achieves a curvature-independent and linear last-iterate convergence rate.
To the best of our knowledge, the possibility of curvature-independent rates and/or last-iterate convergence has not been considered before.
arXiv Detail & Related papers (2023-06-29T01:20:44Z) - The Dynamics of Riemannian Robbins-Monro Algorithms [101.29301565229265]
We propose a family of Riemannian algorithms generalizing and extending the seminal approximation framework of Robbins and Monro.
Compared to their Euclidean counterparts, Riemannian algorithms are much less understood due to lack of a global linear structure on the manifold.
We provide a general template of almost sure convergence results that mirrors and extends the existing theory for Euclidean Robbins-Monro schemes.
arXiv Detail & Related papers (2022-06-14T12:30:11Z) - A Riemannian smoothing steepest descent method for non-Lipschitz
optimization on submanifolds [15.994495285950293]
A smoothing steepest descent method is proposed to minimize a non-Lipschitz function on submanifolds.
We prove any point of the sequence generated by the smoothing function is a limiting point of the original non-Lipschitz problem.
Numerical experiments are conducted to demonstrate the efficiency of the proposed method.
arXiv Detail & Related papers (2021-04-09T05:38:28Z) - Sinkhorn Natural Gradient for Generative Models [125.89871274202439]
We propose a novel Sinkhorn Natural Gradient (SiNG) algorithm which acts as a steepest descent method on the probability space endowed with the Sinkhorn divergence.
We show that the Sinkhorn information matrix (SIM), a key component of SiNG, has an explicit expression and can be evaluated accurately in complexity that scales logarithmically.
In our experiments, we quantitatively compare SiNG with state-of-the-art SGD-type solvers on generative tasks to demonstrate its efficiency and efficacy of our method.
arXiv Detail & Related papers (2020-11-09T02:51:17Z) - Stochastic Zeroth-order Riemannian Derivative Estimation and
Optimization [15.78743548731191]
We propose an oracle version of the Gaussian smoothing function to overcome the difficulty of non-linearity of manifold non-linearity.
We demonstrate the applicability of our algorithms by results and real-world applications on black-box stiffness control for robotics and black-box attacks to neural networks.
arXiv Detail & Related papers (2020-03-25T06:58:19Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.