Convergence guarantees for forward gradient descent in the linear regression model
- URL: http://arxiv.org/abs/2309.15001v2
- Date: Thu, 20 Jun 2024 11:51:11 GMT
- Title: Convergence guarantees for forward gradient descent in the linear regression model
- Authors: Thijs Bos, Johannes Schmidt-Hieber,
- Abstract summary: We study the biologically motivated (weight-perturbed) forward gradient scheme that is based on random linear combination of the gradient.
We prove that the mean squared error of this method converges for $kgtrsim d2log(d)$ with rate $d2log(d)/k.$ Compared to the dimension dependence d for gradient descent, an additional factor $dlog(d)$ occurs.
- Score: 5.448070998907116
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Renewed interest in the relationship between artificial and biological neural networks motivates the study of gradient-free methods. Considering the linear regression model with random design, we theoretically analyze in this work the biologically motivated (weight-perturbed) forward gradient scheme that is based on random linear combination of the gradient. If d denotes the number of parameters and k the number of samples, we prove that the mean squared error of this method converges for $k\gtrsim d^2\log(d)$ with rate $d^2\log(d)/k.$ Compared to the dimension dependence d for stochastic gradient descent, an additional factor $d\log(d)$ occurs.
Related papers
- Limit Theorems for Stochastic Gradient Descent with Infinite Variance [47.87144151929621]
We show that the gradient descent algorithm can be characterized as the stationary distribution of a suitably defined Ornstein-rnstein process driven by an appropriate L'evy process.
We also explore the applications of these results in linear regression and logistic regression models.
arXiv Detail & Related papers (2024-10-21T09:39:10Z) - Gradient Descent Converges Linearly for Logistic Regression on Separable
Data [17.60502131429094]
We show that running gradient descent with variable learning rate guarantees loss $f(x) leq 1.1 cdot f(x*) + epsilon$ the logistic regression objective.
We also apply our ideas to sparse logistic regression, where they lead to an exponential improvement of the sparsity-error tradeoff.
arXiv Detail & Related papers (2023-06-26T02:15:26Z) - Score-based Continuous-time Discrete Diffusion Models [102.65769839899315]
We extend diffusion models to discrete variables by introducing a Markov jump process where the reverse process denoises via a continuous-time Markov chain.
We show that an unbiased estimator can be obtained via simple matching the conditional marginal distributions.
We demonstrate the effectiveness of the proposed method on a set of synthetic and real-world music and image benchmarks.
arXiv Detail & Related papers (2022-11-30T05:33:29Z) - Nonconvex Stochastic Scaled-Gradient Descent and Generalized Eigenvector
Problems [98.34292831923335]
Motivated by the problem of online correlation analysis, we propose the emphStochastic Scaled-Gradient Descent (SSD) algorithm.
We bring these ideas together in an application to online correlation analysis, deriving for the first time an optimal one-time-scale algorithm with an explicit rate of local convergence to normality.
arXiv Detail & Related papers (2021-12-29T18:46:52Z) - Unbiased Estimation of the Gradient of the Log-Likelihood for a Class of
Continuous-Time State-Space Models [0.0]
We consider static parameter estimation for a class of continuous-time state-space models.
Our goal is to obtain an unbiased estimate of the gradient of the log-likelihood (score function)
arXiv Detail & Related papers (2021-05-24T20:31:48Z) - Infinitely Deep Bayesian Neural Networks with Stochastic Differential
Equations [37.02511585732081]
We perform scalable approximate inference in a recently-proposed family of continuous-depth neural networks.
We demonstrate gradient-based variational inference, producing arbitrarily-flexible approximate posteriors.
This approach further inherits the memory-efficient training and tunable precision of neural ODEs.
arXiv Detail & Related papers (2021-02-12T14:48:58Z) - Probabilistic Circuits for Variational Inference in Discrete Graphical
Models [101.28528515775842]
Inference in discrete graphical models with variational methods is difficult.
Many sampling-based methods have been proposed for estimating Evidence Lower Bound (ELBO)
We propose a new approach that leverages the tractability of probabilistic circuit models, such as Sum Product Networks (SPN)
We show that selective-SPNs are suitable as an expressive variational distribution, and prove that when the log-density of the target model is aweighted the corresponding ELBO can be computed analytically.
arXiv Detail & Related papers (2020-10-22T05:04:38Z) - Faster Convergence of Stochastic Gradient Langevin Dynamics for
Non-Log-Concave Sampling [110.88857917726276]
We provide a new convergence analysis of gradient Langevin dynamics (SGLD) for sampling from a class of distributions that can be non-log-concave.
At the core of our approach is a novel conductance analysis of SGLD using an auxiliary time-reversible Markov Chain.
arXiv Detail & Related papers (2020-10-19T15:23:18Z) - Path Sample-Analytic Gradient Estimators for Stochastic Binary Networks [78.76880041670904]
In neural networks with binary activations and or binary weights the training by gradient descent is complicated.
We propose a new method for this estimation problem combining sampling and analytic approximation steps.
We experimentally show higher accuracy in gradient estimation and demonstrate a more stable and better performing training in deep convolutional models.
arXiv Detail & Related papers (2020-06-04T21:51:21Z) - The Impact of the Mini-batch Size on the Variance of Gradients in
Stochastic Gradient Descent [28.148743710421932]
The mini-batch gradient descent (SGD) algorithm is widely used in training machine learning models.
We study SGD dynamics under linear regression and two-layer linear networks, with an easy extension to deeper linear networks.
arXiv Detail & Related papers (2020-04-27T20:06:11Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.