Related papers: Convergence guarantees for forward gradient descent in the linear regression model

Convergence guarantees for forward gradient descent in the linear regression model

URL: http://arxiv.org/abs/2309.15001v2
Date: Thu, 20 Jun 2024 11:51:11 GMT
Title: Convergence guarantees for forward gradient descent in the linear regression model
Authors: Thijs Bos, Johannes Schmidt-Hieber,
Abstract summary: We study the biologically motivated (weight-perturbed) forward gradient scheme that is based on random linear combination of the gradient. We prove that the mean squared error of this method converges for $kgtrsim d2log(d)$ with rate $d2log(d)/k.$ Compared to the dimension dependence d for gradient descent, an additional factor $dlog(d)$ occurs.
Score: 5.448070998907116
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Renewed interest in the relationship between artificial and biological neural networks motivates the study of gradient-free methods. Considering the linear regression model with random design, we theoretically analyze in this work the biologically motivated (weight-perturbed) forward gradient scheme that is based on random linear combination of the gradient. If d denotes the number of parameters and k the number of samples, we prove that the mean squared error of this method converges for $k\gtrsim d^2\log(d)$ with rate $d^2\log(d)/k.$ Compared to the dimension dependence d for stochastic gradient descent, an additional factor $d\log(d)$ occurs.

Related papers

Two-Point Deterministic Equivalence for Stochastic Gradient Dynamics in Linear Models [76.52307406752556]
We derive a novel deterministic equivalence for the two-point function of a random resolvent. We give a unified derivation of the performance of a wide variety of high-dimensional trained linear models with gradient descent.
arXiv Detail & Related papers (2025-02-07T16:45:40Z)
Improving the Convergence Rates of Forward Gradient Descent with Repeated Sampling [5.448070998907116]
Forward gradient descent (FGD) has been proposed as a biologically more plausible alternative of gradient descent. In this paper we show that by computing $ell$ FGD steps based on each training sample, this suboptimality factor becomes $d/(ell wedge d)$. We also show that FGD with repeated sampling can adapt to low-dimensional structure in the input distribution.
arXiv Detail & Related papers (2024-11-26T16:28:16Z)
Limit Theorems for Stochastic Gradient Descent with Infinite Variance [47.87144151929621]
We show that the gradient descent algorithm can be characterized as the stationary distribution of a suitably defined Ornstein-rnstein process driven by an appropriate L'evy process. We also explore the applications of these results in linear regression and logistic regression models.
arXiv Detail & Related papers (2024-10-21T09:39:10Z)
Gradient Descent Converges Linearly for Logistic Regression on Separable Data [17.60502131429094]
We show that running gradient descent with variable learning rate guarantees loss $f(x) leq 1.1 cdot f(x*) + epsilon$ the logistic regression objective. We also apply our ideas to sparse logistic regression, where they lead to an exponential improvement of the sparsity-error tradeoff.
arXiv Detail & Related papers (2023-06-26T02:15:26Z)
Score-based Continuous-time Discrete Diffusion Models [102.65769839899315]
We extend diffusion models to discrete variables by introducing a Markov jump process where the reverse process denoises via a continuous-time Markov chain. We show that an unbiased estimator can be obtained via simple matching the conditional marginal distributions. We demonstrate the effectiveness of the proposed method on a set of synthetic and real-world music and image benchmarks.
arXiv Detail & Related papers (2022-11-30T05:33:29Z)
Nonconvex Stochastic Scaled-Gradient Descent and Generalized Eigenvector Problems [98.34292831923335]
Motivated by the problem of online correlation analysis, we propose the emphStochastic Scaled-Gradient Descent (SSD) algorithm. We bring these ideas together in an application to online correlation analysis, deriving for the first time an optimal one-time-scale algorithm with an explicit rate of local convergence to normality.
arXiv Detail & Related papers (2021-12-29T18:46:52Z)
Unbiased Estimation of the Gradient of the Log-Likelihood for a Class of Continuous-Time State-Space Models [0.0]
We consider static parameter estimation for a class of continuous-time state-space models. Our goal is to obtain an unbiased estimate of the gradient of the log-likelihood (score function)
arXiv Detail & Related papers (2021-05-24T20:31:48Z)
Probabilistic Circuits for Variational Inference in Discrete Graphical Models [101.28528515775842]
Inference in discrete graphical models with variational methods is difficult. Many sampling-based methods have been proposed for estimating Evidence Lower Bound (ELBO) We propose a new approach that leverages the tractability of probabilistic circuit models, such as Sum Product Networks (SPN) We show that selective-SPNs are suitable as an expressive variational distribution, and prove that when the log-density of the target model is aweighted the corresponding ELBO can be computed analytically.
arXiv Detail & Related papers (2020-10-22T05:04:38Z)
Faster Convergence of Stochastic Gradient Langevin Dynamics for Non-Log-Concave Sampling [110.88857917726276]
We provide a new convergence analysis of gradient Langevin dynamics (SGLD) for sampling from a class of distributions that can be non-log-concave. At the core of our approach is a novel conductance analysis of SGLD using an auxiliary time-reversible Markov Chain.
arXiv Detail & Related papers (2020-10-19T15:23:18Z)
Path Sample-Analytic Gradient Estimators for Stochastic Binary Networks [78.76880041670904]
In neural networks with binary activations and or binary weights the training by gradient descent is complicated. We propose a new method for this estimation problem combining sampling and analytic approximation steps. We experimentally show higher accuracy in gradient estimation and demonstrate a more stable and better performing training in deep convolutional models.
arXiv Detail & Related papers (2020-06-04T21:51:21Z)
The Impact of the Mini-batch Size on the Variance of Gradients in Stochastic Gradient Descent [28.148743710421932]
The mini-batch gradient descent (SGD) algorithm is widely used in training machine learning models. We study SGD dynamics under linear regression and two-layer linear networks, with an easy extension to deeper linear networks.
arXiv Detail & Related papers (2020-04-27T20:06:11Z)

This list is automatically generated from the titles and abstracts of the papers in this site.