Learning a Single Neuron with Bias Using Gradient Descent
- URL: http://arxiv.org/abs/2106.01101v1
- Date: Wed, 2 Jun 2021 12:09:55 GMT
- Title: Learning a Single Neuron with Bias Using Gradient Descent
- Authors: Gal Vardi, Gilad Yehudai, Ohad Shamir
- Abstract summary: We study the fundamental problem of learning a single neuron with a bias term.
We show that this is a significantly different and more challenging problem than the bias-less case.
- Score: 53.15475693468925
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: We theoretically study the fundamental problem of learning a single neuron
with a bias term ($\mathbf{x} \mapsto \sigma(<\mathbf{w},\mathbf{x}> + b)$) in
the realizable setting with the ReLU activation, using gradient descent.
Perhaps surprisingly, we show that this is a significantly different and more
challenging problem than the bias-less case (which was the focus of previous
works on single neurons), both in terms of the optimization geometry as well as
the ability of gradient methods to succeed in some scenarios. We provide a
detailed study of this problem, characterizing the critical points of the
objective, demonstrating failure cases, and providing positive convergence
guarantees under different sets of assumptions. To prove our results, we
develop some tools which may be of independent interest, and improve previous
results on learning single neurons.
Related papers
- On the Hardness of Probabilistic Neurosymbolic Learning [10.180468225166441]
We study the complexity of differentiating probabilistic reasoning in neurosymbolic models.
We introduce WeightME, an unbiased gradient estimator based on model sampling.
Our experiments indicate that the existing biased approximations indeed struggle to optimize even when exact solving is still feasible.
arXiv Detail & Related papers (2024-06-06T19:56:33Z) - Using Linear Regression for Iteratively Training Neural Networks [4.873362301533824]
We present a simple linear regression based approach for learning the weights and biases of a neural network.
The approach is intended to be to larger, more complex architectures.
arXiv Detail & Related papers (2023-07-11T11:53:25Z) - Promises and Pitfalls of the Linearized Laplace in Bayesian Optimization [73.80101701431103]
The linearized-Laplace approximation (LLA) has been shown to be effective and efficient in constructing Bayesian neural networks.
We study the usefulness of the LLA in Bayesian optimization and highlight its strong performance and flexibility.
arXiv Detail & Related papers (2023-04-17T14:23:43Z) - Experimental Design for Linear Functionals in Reproducing Kernel Hilbert
Spaces [102.08678737900541]
We provide algorithms for constructing bias-aware designs for linear functionals.
We derive non-asymptotic confidence sets for fixed and adaptive designs under sub-Gaussian noise.
arXiv Detail & Related papers (2022-05-26T20:56:25Z) - NeuralEF: Deconstructing Kernels by Deep Neural Networks [47.54733625351363]
Traditional nonparametric solutions based on the Nystr"om formula suffer from scalability issues.
Recent work has resorted to a parametric approach, i.e., training neural networks to approximate the eigenfunctions.
We show that these problems can be fixed by using a new series of objective functions that generalizes to space of supervised and unsupervised learning problems.
arXiv Detail & Related papers (2022-04-30T05:31:07Z) - On the Benefits of Large Learning Rates for Kernel Methods [110.03020563291788]
We show that a phenomenon can be precisely characterized in the context of kernel methods.
We consider the minimization of a quadratic objective in a separable Hilbert space, and show that with early stopping, the choice of learning rate influences the spectral decomposition of the obtained solution.
arXiv Detail & Related papers (2022-02-28T13:01:04Z) - Generalization of Neural Combinatorial Solvers Through the Lens of
Adversarial Robustness [68.97830259849086]
Most datasets only capture a simpler subproblem and likely suffer from spurious features.
We study adversarial robustness - a local generalization property - to reveal hard, model-specific instances and spurious features.
Unlike in other applications, where perturbation models are designed around subjective notions of imperceptibility, our perturbation models are efficient and sound.
Surprisingly, with such perturbations, a sufficiently expressive neural solver does not suffer from the limitations of the accuracy-robustness trade-off common in supervised learning.
arXiv Detail & Related papers (2021-10-21T07:28:11Z) - Causal Inference Under Unmeasured Confounding With Negative Controls: A
Minimax Learning Approach [84.29777236590674]
We study the estimation of causal parameters when not all confounders are observed and instead negative controls are available.
Recent work has shown how these can enable identification and efficient estimation via two so-called bridge functions.
arXiv Detail & Related papers (2021-03-25T17:59:19Z) - Learning a Single Neuron with Gradient Methods [39.291483556116454]
We consider the fundamental problem of learning a single neuron $x mapstosigma(wtop x)$ using standard gradient methods.
We ask whether a more general result is attainable, under milder assumptions.
arXiv Detail & Related papers (2020-01-15T10:02:45Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.