Riemannian Stochastic Proximal Gradient Methods for Nonsmooth
Optimization over the Stiefel Manifold
- URL: http://arxiv.org/abs/2005.01209v2
- Date: Mon, 21 Mar 2022 01:07:27 GMT
- Title: Riemannian Stochastic Proximal Gradient Methods for Nonsmooth
Optimization over the Stiefel Manifold
- Authors: Bokun Wang, Shiqian Ma, Lingzhou Xue
- Abstract summary: R-ProxSGD and R-ProxSPB are generalizations of proximal SGD and proximal SpiderBoost.
R-ProxSPB algorithm finds an $epsilon$-stationary point with $O(epsilon-3)$ IFOs in the online case, and $O(n+sqrtnepsilon-3)$ IFOs in the finite-sum case.
- Score: 7.257751371276488
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Riemannian optimization has drawn a lot of attention due to its wide
applications in practice. Riemannian stochastic first-order algorithms have
been studied in the literature to solve large-scale machine learning problems
over Riemannian manifolds. However, most of the existing Riemannian stochastic
algorithms require the objective function to be differentiable, and they do not
apply to the case where the objective function is nonsmooth. In this paper, we
present two Riemannian stochastic proximal gradient methods for minimizing
nonsmooth function over the Stiefel manifold. The two methods, named R-ProxSGD
and R-ProxSPB, are generalizations of proximal SGD and proximal SpiderBoost in
Euclidean setting to the Riemannian setting. Analysis on the incremental
first-order oracle (IFO) complexity of the proposed algorithms is provided.
Specifically, the R-ProxSPB algorithm finds an $\epsilon$-stationary point with
$\O(\epsilon^{-3})$ IFOs in the online case, and $\O(n+\sqrt{n}\epsilon^{-2})$
IFOs in the finite-sum case with $n$ being the number of summands in the
objective. Experimental results on online sparse PCA and robust low-rank matrix
completion show that our proposed methods significantly outperform the existing
methods that use Riemannian subgradient information.
Related papers
- Riemannian Bilevel Optimization [35.42472057648458]
We focus in particular on batch and gradient-based methods, with the explicit goal of avoiding second-order information.
We propose and analyze $mathrmRF2SA$, a method that leverages first-order gradient information.
We provide explicit convergence rates for reaching $epsilon$-stationary points under various setups.
arXiv Detail & Related papers (2024-05-22T20:49:01Z) - Streamlining in the Riemannian Realm: Efficient Riemannian Optimization
with Loopless Variance Reduction [4.578425862931332]
This study focuses on the crucial reduction mechanism used in both Euclidean and Riemannian settings.
Motivated by Euclidean methods, we introduce R-based methods to replace the outer loop with computation triggered by a coin flip.
Using R- as a framework, we demonstrate its applicability to various important settings.
arXiv Detail & Related papers (2024-03-11T12:49:37Z) - Riemannian stochastic optimization methods avoid strict saddle points [68.80251170757647]
We show that policies under study avoid strict saddle points / submanifolds with probability 1.
This result provides an important sanity check as it shows that, almost always, the limit state of an algorithm can only be a local minimizer.
arXiv Detail & Related papers (2023-11-04T11:12:24Z) - Decentralized Riemannian Conjugate Gradient Method on the Stiefel
Manifold [59.73080197971106]
This paper presents a first-order conjugate optimization method that converges faster than the steepest descent method.
It aims to achieve global convergence over the Stiefel manifold.
arXiv Detail & Related papers (2023-08-21T08:02:16Z) - Low-complexity subspace-descent over symmetric positive definite
manifold [9.346050098365648]
We develop low-complexity algorithms for the minimization of functions over the symmetric positive definite (SPD) manifold.
The proposed approach utilizes carefully chosen subspaces that allow the update to be written as a product of the Cholesky factor of the iterate and a sparse matrix.
arXiv Detail & Related papers (2023-05-03T11:11:46Z) - Decentralized Riemannian Algorithm for Nonconvex Minimax Problems [82.50374560598493]
The minimax algorithms for neural networks have been developed to solve many problems.
In this paper, we propose two types of minimax algorithms.
For the setting, we propose DRSGDA and prove that our method achieves a gradient.
arXiv Detail & Related papers (2023-02-08T01:42:45Z) - Gradient-Free Methods for Deterministic and Stochastic Nonsmooth
Nonconvex Optimization [94.19177623349947]
Non-smooth non optimization problems emerge in machine learning and business making.
Two core challenges impede the development of efficient methods with finitetime convergence guarantee.
Two-phase versions of GFM and SGFM are also proposed and proven to achieve improved large-deviation results.
arXiv Detail & Related papers (2022-09-12T06:53:24Z) - Riemannian Stochastic Gradient Method for Nested Composition Optimization [0.0]
This work considers optimization of composition of functions in a nested form over Riemannian where each function contains an expectation.
This type of problems is gaining popularity in applications such as policy evaluation in reinforcement learning or model customization in metalearning.
arXiv Detail & Related papers (2022-07-19T15:58:27Z) - First-Order Algorithms for Min-Max Optimization in Geodesic Metric
Spaces [93.35384756718868]
min-max algorithms have been analyzed in the Euclidean setting.
We prove that the extraiteient (RCEG) method corrected lastrate convergence at a linear rate.
arXiv Detail & Related papers (2022-06-04T18:53:44Z) - Automatic differentiation for Riemannian optimization on low-rank matrix
and tensor-train manifolds [71.94111815357064]
In scientific computing and machine learning applications, matrices and more general multidimensional arrays (tensors) can often be approximated with the help of low-rank decompositions.
One of the popular tools for finding the low-rank approximations is to use the Riemannian optimization.
arXiv Detail & Related papers (2021-03-27T19:56:00Z) - Stochastic Zeroth-order Riemannian Derivative Estimation and
Optimization [15.78743548731191]
We propose an oracle version of the Gaussian smoothing function to overcome the difficulty of non-linearity of manifold non-linearity.
We demonstrate the applicability of our algorithms by results and real-world applications on black-box stiffness control for robotics and black-box attacks to neural networks.
arXiv Detail & Related papers (2020-03-25T06:58:19Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.