Related papers: Improving the Convergence Rates of Forward Gradient Descent with Repeated Sampling

Improving the Convergence Rates of Forward Gradient Descent with Repeated Sampling

URL: http://arxiv.org/abs/2411.17567v1
Date: Tue, 26 Nov 2024 16:28:16 GMT
Title: Improving the Convergence Rates of Forward Gradient Descent with Repeated Sampling
Authors: Niklas Dexheimer, Johannes Schmidt-Hieber,
Abstract summary: Forward gradient descent (FGD) has been proposed as a biologically more plausible alternative of gradient descent. In this paper we show that by computing $ell$ FGD steps based on each training sample, this suboptimality factor becomes $d/(ell wedge d)$. We also show that FGD with repeated sampling can adapt to low-dimensional structure in the input distribution.
Score: 5.448070998907116
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Forward gradient descent (FGD) has been proposed as a biologically more plausible alternative of gradient descent as it can be computed without backward pass. Considering the linear model with $d$ parameters, previous work has found that the prediction error of FGD is, however, by a factor $d$ slower than the prediction error of stochastic gradient descent (SGD). In this paper we show that by computing $\ell$ FGD steps based on each training sample, this suboptimality factor becomes $d/(\ell \wedge d)$ and thus the suboptimality of the rate disappears if $\ell \gtrsim d.$ We also show that FGD with repeated sampling can adapt to low-dimensional structure in the input distribution. The main mathematical challenge lies in controlling the dependencies arising from the repeated sampling process.

Related papers

Asymptotics of Stochastic Gradient Descent with Dropout Regularization in Linear Models [8.555650549124818]
This paper proposes a theory for online inference of the gradient descent (SGD) iterates with dropout regularization in linear regression. For sufficiently large samples, the proposed confidence intervals for ASGD with dropout nearly achieve the nominal coverage probability.
arXiv Detail & Related papers (2024-09-11T17:28:38Z)
Uncertainty quantification for iterative algorithms in linear models with application to early stopping [4.150180443030652]
This paper investigates the iterates $hbb1,dots,hbbT$ obtained from iterative algorithms in high-dimensional linear regression problems. The analysis and proposed estimators are applicable to Gradient Descent (GD), GD and their accelerated variants such as Fast Iterative Soft-Thresholding (FISTA)
arXiv Detail & Related papers (2024-04-27T10:20:41Z)
Large Stepsize Gradient Descent for Logistic Loss: Non-Monotonicity of the Loss Improves Optimization Efficiency [47.8739414267201]
We consider gradient descent (GD) with a constant stepsize applied to logistic regression with linearly separable data. We show that GD exits this initial oscillatory phase rapidly -- in $mathcalO(eta)$ steps -- and subsequently achieves an $tildemathcalO (1 / (eta t) )$ convergence rate. Our results imply that, given a budget of $T$ steps, GD can achieve an accelerated loss of $tildemathcalO (1/T2)$ with an aggressive stepsize
arXiv Detail & Related papers (2024-02-24T23:10:28Z)
SIMPLE: A Gradient Estimator for $k$-Subset Sampling [42.38652558807518]
In this work, we fall back to discrete $k$-subset sampling on the forward pass. We show that our gradient estimator, SIMPLE, exhibits lower bias and variance compared to state-of-the-art estimators. Empirical results show improved performance on learning to explain and sparse linear regression.
arXiv Detail & Related papers (2022-10-04T22:33:16Z)
Adaptive Sketches for Robust Regression with Importance Sampling [64.75899469557272]
We introduce data structures for solving robust regression through gradient descent (SGD) Our algorithm effectively runs $T$ steps of SGD with importance sampling while using sublinear space and just making a single pass over the data.
arXiv Detail & Related papers (2022-07-16T03:09:30Z)
MSTGD:A Memory Stochastic sTratified Gradient Descent Method with an Exponential Convergence Rate [0.0]
This paper designs a novel underlineMemory underlineStochastic sunderlineTratified Gradient Descend(underlineMSTGD) algorithm with an exponential convergence rate. Specifically, MSTGD uses two strategies for variance reduction: the first strategy is to perform variance reduction according to the proportion p of used historical gradient. Unlike most other algorithms that claim to achieve an exponential convergence rate, the convergence rate is independent of parameters such as dataset size N, batch size n, etc.
arXiv Detail & Related papers (2022-02-21T01:36:26Z)
Heavy-tailed Streaming Statistical Estimation [58.70341336199497]
We consider the task of heavy-tailed statistical estimation given streaming $p$ samples. We design a clipped gradient descent and provide an improved analysis under a more nuanced condition on the noise of gradients.
arXiv Detail & Related papers (2021-08-25T21:30:27Z)
A Variance Controlled Stochastic Method with Biased Estimation for Faster Non-convex Optimization [0.0]
We propose a new technique, em variance controlled gradient (VCSG), to improve the performance of the reduced gradient (SVRG) $lambda$ is introduced in VCSG to avoid over-reducing a variance by SVRG. $mathcalO(min1/epsilon3/2,n1/4/epsilon)$ number of gradient evaluations, which improves the leading gradient complexity.
arXiv Detail & Related papers (2021-02-19T12:22:56Z)
Faster Convergence of Stochastic Gradient Langevin Dynamics for Non-Log-Concave Sampling [110.88857917726276]
We provide a new convergence analysis of gradient Langevin dynamics (SGLD) for sampling from a class of distributions that can be non-log-concave. At the core of our approach is a novel conductance analysis of SGLD using an auxiliary time-reversible Markov Chain.
arXiv Detail & Related papers (2020-10-19T15:23:18Z)
Private Stochastic Non-Convex Optimization: Adaptive Algorithms and Tighter Generalization Bounds [72.63031036770425]
We propose differentially private (DP) algorithms for bound non-dimensional optimization. We demonstrate two popular deep learning methods on the empirical advantages over standard gradient methods.
arXiv Detail & Related papers (2020-06-24T06:01:24Z)
Carath\'eodory Sampling for Stochastic Gradient Descent [79.55586575988292]
We present an approach that is inspired by classical results of Tchakaloff and Carath'eodory about measure reduction. We adaptively select the descent steps where the measure reduction is carried out. We combine this with Block Coordinate Descent so that measure reduction can be done very cheaply.
arXiv Detail & Related papers (2020-06-02T17:52:59Z)

This list is automatically generated from the titles and abstracts of the papers in this site.