Hebbian learning inspired estimation of the linear regression parameters
from queries
- URL: http://arxiv.org/abs/2311.03483v1
- Date: Tue, 26 Sep 2023 19:00:32 GMT
- Title: Hebbian learning inspired estimation of the linear regression parameters
from queries
- Authors: Johannes Schmidt-Hieber and Wouter M Koolen
- Abstract summary: We study a variation of this Hebbian learning rule to recover the regression vector in the linear regression model.
We prove that this Hebbian learning rule can achieve considerably faster rates than any non-adaptive method that selects the queries independently of the data.
- Score: 18.374824005225186
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Local learning rules in biological neural networks (BNNs) are commonly
referred to as Hebbian learning. [26] links a biologically motivated Hebbian
learning rule to a specific zeroth-order optimization method. In this work, we
study a variation of this Hebbian learning rule to recover the regression
vector in the linear regression model. Zeroth-order optimization methods are
known to converge with suboptimal rate for large parameter dimension compared
to first-order methods like gradient descent, and are therefore thought to be
in general inferior. By establishing upper and lower bounds, we show, however,
that such methods achieve near-optimal rates if only queries of the linear
regression loss are available. Moreover, we prove that this Hebbian learning
rule can achieve considerably faster rates than any non-adaptive method that
selects the queries independently of the data.
Related papers
- Gradient-Variation Online Learning under Generalized Smoothness [56.38427425920781]
gradient-variation online learning aims to achieve regret guarantees that scale with variations in gradients of online functions.
Recent efforts in neural network optimization suggest a generalized smoothness condition, allowing smoothness to correlate with gradient norms.
We provide the applications for fast-rate convergence in games and extended adversarial optimization.
arXiv Detail & Related papers (2024-08-17T02:22:08Z) - Stochastic Gradient Descent for Gaussian Processes Done Right [86.83678041846971]
We show that when emphdone right -- by which we mean using specific insights from optimisation and kernel communities -- gradient descent is highly effective.
We introduce a emphstochastic dual descent algorithm, explain its design in an intuitive manner and illustrate the design choices.
Our method places Gaussian process regression on par with state-of-the-art graph neural networks for molecular binding affinity prediction.
arXiv Detail & Related papers (2023-10-31T16:15:13Z) - Low-rank extended Kalman filtering for online learning of neural
networks from streaming data [71.97861600347959]
We propose an efficient online approximate Bayesian inference algorithm for estimating the parameters of a nonlinear function from a potentially non-stationary data stream.
The method is based on the extended Kalman filter (EKF), but uses a novel low-rank plus diagonal decomposition of the posterior matrix.
In contrast to methods based on variational inference, our method is fully deterministic, and does not require step-size tuning.
arXiv Detail & Related papers (2023-05-31T03:48:49Z) - Memory-Efficient Backpropagation through Large Linear Layers [107.20037639738433]
In modern neural networks like Transformers, linear layers require significant memory to store activations during backward pass.
This study proposes a memory reduction approach to perform backpropagation through linear layers.
arXiv Detail & Related papers (2022-01-31T13:02:41Z) - From inexact optimization to learning via gradient concentration [22.152317081922437]
In this paper, we investigate the phenomenon in the context of linear models with smooth loss functions.
We propose a proof technique combining ideas from inexact optimization and probability theory, specifically gradient concentration.
arXiv Detail & Related papers (2021-06-09T21:23:29Z) - Deep learning: a statistical viewpoint [120.94133818355645]
Deep learning has revealed some major surprises from a theoretical perspective.
In particular, simple gradient methods easily find near-perfect solutions to non-optimal training problems.
We conjecture that specific principles underlie these phenomena.
arXiv Detail & Related papers (2021-03-16T16:26:36Z) - Convergence rates for gradient descent in the training of
overparameterized artificial neural networks with biases [3.198144010381572]
In recent years, artificial neural networks have developed into a powerful tool for dealing with a multitude of problems for which classical solution approaches.
It is still unclear why randomly gradient descent algorithms reach their limits.
arXiv Detail & Related papers (2021-02-23T18:17:47Z) - Benefit of deep learning with non-convex noisy gradient descent:
Provable excess risk bound and superiority to kernel methods [41.60125423028092]
We show that any linear estimator can be outperformed by deep learning in a sense of minimax optimal rate.
The excess bounds are so-called fast learning rate which is faster than $O bounds.
arXiv Detail & Related papers (2020-12-06T09:22:16Z) - On Generalization of Adaptive Methods for Over-parameterized Linear
Regression [27.156348760303864]
We aim to characterize the performance of adaptive methods in the over- parameterized linear regression setting.
Our experiments on over- parameterized linear regression and deep neural networks support this theory.
arXiv Detail & Related papers (2020-11-28T04:19:32Z) - A Random Matrix Theory Approach to Damping in Deep Learning [0.7614628596146599]
We conjecture that the inherent difference in generalisation between adaptive and non-adaptive gradient methods in deep learning stems from the increased estimation noise.
We develop a novel random matrix theory based damping learner for second order optimiser inspired by linear shrinkage estimation.
arXiv Detail & Related papers (2020-11-15T18:19:42Z) - Fast OSCAR and OWL Regression via Safe Screening Rules [97.28167655721766]
Ordered $L_1$ (OWL) regularized regression is a new regression analysis for high-dimensional sparse learning.
Proximal gradient methods are used as standard approaches to solve OWL regression.
We propose the first safe screening rule for OWL regression by exploring the order of the primal solution with the unknown order structure.
arXiv Detail & Related papers (2020-06-29T23:35:53Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.