Related papers: Hebbian learning inspired estimation of the linear regression parameters from queries

Hebbian learning inspired estimation of the linear regression parameters from queries

URL: http://arxiv.org/abs/2311.03483v1
Date: Tue, 26 Sep 2023 19:00:32 GMT
Title: Hebbian learning inspired estimation of the linear regression parameters from queries
Authors: Johannes Schmidt-Hieber and Wouter M Koolen
Abstract summary: We study a variation of this Hebbian learning rule to recover the regression vector in the linear regression model. We prove that this Hebbian learning rule can achieve considerably faster rates than any non-adaptive method that selects the queries independently of the data.
Score: 18.374824005225186
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Local learning rules in biological neural networks (BNNs) are commonly referred to as Hebbian learning. [26] links a biologically motivated Hebbian learning rule to a specific zeroth-order optimization method. In this work, we study a variation of this Hebbian learning rule to recover the regression vector in the linear regression model. Zeroth-order optimization methods are known to converge with suboptimal rate for large parameter dimension compared to first-order methods like gradient descent, and are therefore thought to be in general inferior. By establishing upper and lower bounds, we show, however, that such methods achieve near-optimal rates if only queries of the linear regression loss are available. Moreover, we prove that this Hebbian learning rule can achieve considerably faster rates than any non-adaptive method that selects the queries independently of the data.

Related papers

Training Deep Learning Models with Norm-Constrained LMOs [56.00317694850397]
We study optimization methods that leverage the linear minimization oracle (LMO) over a norm-ball. We propose a new family of algorithms that uses the LMO to adapt to the geometry of the problem and, perhaps surprisingly, show that they can be applied to unconstrained problems.
arXiv Detail & Related papers (2025-02-11T13:10:34Z)
Model-Robust and Adaptive-Optimal Transfer Learning for Tackling Concept Shifts in Nonparametric Regression [7.243632426715939]
We present a transfer learning procedure that is robust against model misspecification while adaptively attaining optimality. We derive the adaptive convergence rates of the excess risk for specifying Gaussian kernels in a prevalent class of hypothesis transfer learning algorithms.
arXiv Detail & Related papers (2025-01-18T20:33:37Z)
Gradient-Variation Online Learning under Generalized Smoothness [56.38427425920781]
gradient-variation online learning aims to achieve regret guarantees that scale with variations in gradients of online functions. Recent efforts in neural network optimization suggest a generalized smoothness condition, allowing smoothness to correlate with gradient norms. We provide the applications for fast-rate convergence in games and extended adversarial optimization.
arXiv Detail & Related papers (2024-08-17T02:22:08Z)
Stochastic Gradient Descent for Gaussian Processes Done Right [86.83678041846971]
We show that when emphdone right -- by which we mean using specific insights from optimisation and kernel communities -- gradient descent is highly effective. We introduce a emphstochastic dual descent algorithm, explain its design in an intuitive manner and illustrate the design choices. Our method places Gaussian process regression on par with state-of-the-art graph neural networks for molecular binding affinity prediction.
arXiv Detail & Related papers (2023-10-31T16:15:13Z)
Low-rank extended Kalman filtering for online learning of neural networks from streaming data [71.97861600347959]
We propose an efficient online approximate Bayesian inference algorithm for estimating the parameters of a nonlinear function from a potentially non-stationary data stream. The method is based on the extended Kalman filter (EKF), but uses a novel low-rank plus diagonal decomposition of the posterior matrix. In contrast to methods based on variational inference, our method is fully deterministic, and does not require step-size tuning.
arXiv Detail & Related papers (2023-05-31T03:48:49Z)
Memory-Efficient Backpropagation through Large Linear Layers [107.20037639738433]
In modern neural networks like Transformers, linear layers require significant memory to store activations during backward pass. This study proposes a memory reduction approach to perform backpropagation through linear layers.
arXiv Detail & Related papers (2022-01-31T13:02:41Z)
From inexact optimization to learning via gradient concentration [22.152317081922437]
In this paper, we investigate the phenomenon in the context of linear models with smooth loss functions. We propose a proof technique combining ideas from inexact optimization and probability theory, specifically gradient concentration.
arXiv Detail & Related papers (2021-06-09T21:23:29Z)
Deep learning: a statistical viewpoint [120.94133818355645]
Deep learning has revealed some major surprises from a theoretical perspective. In particular, simple gradient methods easily find near-perfect solutions to non-optimal training problems. We conjecture that specific principles underlie these phenomena.
arXiv Detail & Related papers (2021-03-16T16:26:36Z)
Convergence rates for gradient descent in the training of overparameterized artificial neural networks with biases [3.198144010381572]
In recent years, artificial neural networks have developed into a powerful tool for dealing with a multitude of problems for which classical solution approaches. It is still unclear why randomly gradient descent algorithms reach their limits.
arXiv Detail & Related papers (2021-02-23T18:17:47Z)
Benefit of deep learning with non-convex noisy gradient descent: Provable excess risk bound and superiority to kernel methods [41.60125423028092]
We show that any linear estimator can be outperformed by deep learning in a sense of minimax optimal rate. The excess bounds are so-called fast learning rate which is faster than $O bounds.
arXiv Detail & Related papers (2020-12-06T09:22:16Z)
On Generalization of Adaptive Methods for Over-parameterized Linear Regression [27.156348760303864]
We aim to characterize the performance of adaptive methods in the over- parameterized linear regression setting. Our experiments on over- parameterized linear regression and deep neural networks support this theory.
arXiv Detail & Related papers (2020-11-28T04:19:32Z)
A Random Matrix Theory Approach to Damping in Deep Learning [0.7614628596146599]
We conjecture that the inherent difference in generalisation between adaptive and non-adaptive gradient methods in deep learning stems from the increased estimation noise. We develop a novel random matrix theory based damping learner for second order optimiser inspired by linear shrinkage estimation.
arXiv Detail & Related papers (2020-11-15T18:19:42Z)
Fast OSCAR and OWL Regression via Safe Screening Rules [97.28167655721766]
Ordered $L_1$ (OWL) regularized regression is a new regression analysis for high-dimensional sparse learning. Proximal gradient methods are used as standard approaches to solve OWL regression. We propose the first safe screening rule for OWL regression by exploring the order of the primal solution with the unknown order structure.
arXiv Detail & Related papers (2020-06-29T23:35:53Z)
AdaS: Adaptive Scheduling of Stochastic Gradients [50.80697760166045]
We introduce the notions of textit"knowledge gain" and textit"mapping condition" and propose a new algorithm called Adaptive Scheduling (AdaS) Experimentation reveals that, using the derived metrics, AdaS exhibits: (a) faster convergence and superior generalization over existing adaptive learning methods; and (b) lack of dependence on a validation set to determine when to stop training.
arXiv Detail & Related papers (2020-06-11T16:36:31Z)

This list is automatically generated from the titles and abstracts of the papers in this site.