Agnostic Learning of General ReLU Activation Using Gradient Descent
- URL: http://arxiv.org/abs/2208.02711v1
- Date: Thu, 4 Aug 2022 15:14:05 GMT
- Title: Agnostic Learning of General ReLU Activation Using Gradient Descent
- Authors: Pranjal Awasthi, Alex Tang, Aravindan Vijayaraghavan
- Abstract summary: We provide a convergence analysis of gradient descent for the problem of agnostically learning a single ReLU function under Gaussian distributions.
Our main result establishes that starting from random iterations, in a number of gradient descent outputs, with high probability, a ReLU function achieves a competitive error guarantee.
- Score: 24.73731167081234
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: We provide a convergence analysis of gradient descent for the problem of
agnostically learning a single ReLU function under Gaussian distributions.
Unlike prior work that studies the setting of zero bias, we consider the more
challenging scenario when the bias of the ReLU function is non-zero. Our main
result establishes that starting from random initialization, in a polynomial
number of iterations gradient descent outputs, with high probability, a ReLU
function that achieves a competitive error guarantee when compared to the error
of the best ReLU function. We also provide finite sample guarantees, and these
techniques generalize to a broader class of marginal distributions beyond
Gaussians.
Related papers
- Characterizing Overfitting in Kernel Ridgeless Regression Through the Eigenspectrum [6.749750044497731]
We prove the phenomena of tempered overfitting and catastrophic overfitting under the sub-Gaussian design assumption.
We also identify that the independence of the features plays an important role in guaranteeing tempered overfitting.
arXiv Detail & Related papers (2024-02-02T10:36:53Z) - Uniform Convergence with Square-Root Lipschitz Loss [35.52581372892922]
We show how these guarantees substantially generalize previous results based on smoothness (Lipschitz constant of the derivative)
We better understand "optimistic rate" and "learning guarantees"
arXiv Detail & Related papers (2023-06-22T20:14:08Z) - Experimental Design for Linear Functionals in Reproducing Kernel Hilbert
Spaces [102.08678737900541]
We provide algorithms for constructing bias-aware designs for linear functionals.
We derive non-asymptotic confidence sets for fixed and adaptive designs under sub-Gaussian noise.
arXiv Detail & Related papers (2022-05-26T20:56:25Z) - On the inability of Gaussian process regression to optimally learn
compositional functions [3.6525095710982916]
Deep Gaussian process priors can outperform Gaussian process priors if the target function has a compositional structure.
We show that if the true function is a generalized additive function, then the posterior based on any mean-zero Gaussian process can only recover the truth at a rate that is strictly slower than the minimax rate.
arXiv Detail & Related papers (2022-05-16T15:42:25Z) - Benign Underfitting of Stochastic Gradient Descent [72.38051710389732]
We study to what extent may gradient descent (SGD) be understood as a "conventional" learning rule that achieves generalization performance by obtaining a good fit training data.
We analyze the closely related with-replacement SGD, for which an analogous phenomenon does not occur and prove that its population risk does in fact converge at the optimal rate.
arXiv Detail & Related papers (2022-02-27T13:25:01Z) - Domain-Adjusted Regression or: ERM May Already Learn Features Sufficient
for Out-of-Distribution Generalization [52.7137956951533]
We argue that devising simpler methods for learning predictors on existing features is a promising direction for future research.
We introduce Domain-Adjusted Regression (DARE), a convex objective for learning a linear predictor that is provably robust under a new model of distribution shift.
Under a natural model, we prove that the DARE solution is the minimax-optimal predictor for a constrained set of test distributions.
arXiv Detail & Related papers (2022-02-14T16:42:16Z) - On the existence of global minima and convergence analyses for gradient
descent methods in the training of deep neural networks [3.198144010381572]
We study feedforward deep ReLU ANNs with an arbitrarily large number of hidden layers.
We prove convergence of the risk of the GD optimization method with randoms in the training of such ANNs.
We also study solutions of gradient flow differential equations.
arXiv Detail & Related papers (2021-12-17T18:55:40Z) - A general sample complexity analysis of vanilla policy gradient [101.16957584135767]
Policy gradient (PG) is one of the most popular reinforcement learning (RL) problems.
"vanilla" theoretical understanding of PG trajectory is one of the most popular methods for solving RL problems.
arXiv Detail & Related papers (2021-07-23T19:38:17Z) - Agnostic Proper Learning of Halfspaces under Gaussian Marginals [56.01192577666607]
We study the problem of agnostically learning halfspaces under the Gaussian.
Our main result is the em first proper learning algorithm for this problem.
arXiv Detail & Related papers (2021-02-10T18:40:44Z) - Approximation Schemes for ReLU Regression [80.33702497406632]
We consider the fundamental problem of ReLU regression.
The goal is to output the best fitting ReLU with respect to square loss given to draws from some unknown distribution.
arXiv Detail & Related papers (2020-05-26T16:26:17Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.