Adaptive Stochastic Gradient Descents on Manifolds with an Application on Weighted Low-Rank Approximation
- URL: http://arxiv.org/abs/2503.11833v2
- Date: Sat, 29 Mar 2025 01:05:48 GMT
- Title: Adaptive Stochastic Gradient Descents on Manifolds with an Application on Weighted Low-Rank Approximation
- Authors: Peiqi Yang, Conglong Xu, Hao Wu,
- Abstract summary: We prove a convergence theorem for gradient descents on manifold with adaptive learning rate.<n>We apply it to the weighted low-rank approximation problem.
- Score: 1.9648630752093679
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: We prove a convergence theorem for stochastic gradient descents on manifolds with adaptive learning rate and apply it to the weighted low-rank approximation problem.
Related papers
- Weighted Low-rank Approximation via Stochastic Gradient Descent on Manifolds [1.9648630752093679]
We establish a convergence-based retraction theorem for gradient descents admitting confinements on a manifold.<n>Our algorithm outperforms existing gradient descents on Euclidean spaces.<n>We also compare the accelerated line search on this manifold to the existing accelerated line search on Euclidean spaces.
arXiv Detail & Related papers (2025-02-20T00:59:50Z) - Flattened one-bit stochastic gradient descent: compressed distributed optimization with controlled variance [55.01966743652196]
We propose a novel algorithm for distributed gradient descent (SGD) with compressed gradient communication in the parameter-server framework.
Our gradient compression technique, named flattened one-bit gradient descent (FO-SGD), relies on two simple algorithmic ideas.
arXiv Detail & Related papers (2024-05-17T21:17:27Z) - Variance Reduction and Low Sample Complexity in Stochastic Optimization
via Proximal Point Method [5.025654873456757]
The paper establishes a low sample complexity to obtain a high probability guarantee on the convergence of the proposed method.
A subroutine is developed to solve the proximal subproblem, which also serves as a novel technique for variance reduction.
arXiv Detail & Related papers (2024-02-14T07:34:22Z) - Adaptive Step Sizes for Preconditioned Stochastic Gradient Descent [0.3831327965422187]
This paper proposes a novel approach to adaptive step sizes in gradient descent (SGD)
We use quantities that we have identified as numerically traceable -- the Lipschitz constant for gradients and a concept of the local variance in search directions.
arXiv Detail & Related papers (2023-11-28T17:03:56Z) - One-step corrected projected stochastic gradient descent for statistical estimation [49.1574468325115]
It is based on the projected gradient descent on the log-likelihood function corrected by a single step of the Fisher scoring algorithm.
We show theoretically and by simulations that it is an interesting alternative to the usual gradient descent with averaging or the adaptative gradient descent.
arXiv Detail & Related papers (2023-06-09T13:43:07Z) - A note on diffusion limits for stochastic gradient descent [0.0]
Much of the theory, that attempts to clarify the role of noise in gradient algorithms, has widely approximated gradient descent by a differential equation with Gaussian noise.
We provide a novel theoretical justification for this practice that showcases how the noise arises naturally.
arXiv Detail & Related papers (2022-10-20T13:27:00Z) - Optimal variance-reduced stochastic approximation in Banach spaces [114.8734960258221]
We study the problem of estimating the fixed point of a contractive operator defined on a separable Banach space.
We establish non-asymptotic bounds for both the operator defect and the estimation error.
arXiv Detail & Related papers (2022-01-21T02:46:57Z) - Nonconvex Stochastic Scaled-Gradient Descent and Generalized Eigenvector
Problems [98.34292831923335]
Motivated by the problem of online correlation analysis, we propose the emphStochastic Scaled-Gradient Descent (SSD) algorithm.
We bring these ideas together in an application to online correlation analysis, deriving for the first time an optimal one-time-scale algorithm with an explicit rate of local convergence to normality.
arXiv Detail & Related papers (2021-12-29T18:46:52Z) - ROOT-SGD: Sharp Nonasymptotics and Near-Optimal Asymptotics in a Single Algorithm [71.13558000599839]
We study the problem of solving strongly convex and smooth unconstrained optimization problems using first-order algorithms.
We devise a novel, referred to as Recursive One-Over-T SGD, based on an easily implementable, averaging of past gradients.
We prove that it simultaneously achieves state-of-the-art performance in both a finite-sample, nonasymptotic sense and an sense.
arXiv Detail & Related papers (2020-08-28T14:46:56Z) - Stochastic Optimization with Heavy-Tailed Noise via Accelerated Gradient
Clipping [69.9674326582747]
We propose a new accelerated first-order method called clipped-SSTM for smooth convex optimization with heavy-tailed distributed noise in gradients.
We prove new complexity that outperform state-of-the-art results in this case.
We derive the first non-trivial high-probability complexity bounds for SGD with clipping without light-tails assumption on the noise.
arXiv Detail & Related papers (2020-05-21T17:05:27Z) - On the Convergence of Adaptive Gradient Methods for Nonconvex Optimization [80.03647903934723]
We prove adaptive gradient methods in expectation of gradient convergence methods.
Our analyses shed light on better adaptive gradient methods in optimizing non understanding gradient bounds.
arXiv Detail & Related papers (2018-08-16T20:25:28Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.