The effect of Leaky ReLUs on the training and generalization of
overparameterized networks
- URL: http://arxiv.org/abs/2402.11942v3
- Date: Sun, 25 Feb 2024 14:46:07 GMT
- Title: The effect of Leaky ReLUs on the training and generalization of
overparameterized networks
- Authors: Yinglong Guo, Shaohan Li, Gilad Lerman
- Abstract summary: We show that $alpha =-1$, which corresponds to the absolute value activation function, is optimal for the training error bound.
Numerical experiments empirically support the practical choices guided by the theory.
- Score: 12.630316710142413
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: We investigate the training and generalization errors of overparameterized
neural networks (NNs) with a wide class of leaky rectified linear unit (ReLU)
functions. More specifically, we carefully upper bound both the convergence
rate of the training error and the generalization error of such NNs and
investigate the dependence of these bounds on the Leaky ReLU parameter,
$\alpha$. We show that $\alpha =-1$, which corresponds to the absolute value
activation function, is optimal for the training error bound. Furthermore, in
special settings, it is also optimal for the generalization error bound.
Numerical experiments empirically support the practical choices guided by the
theory.
Related papers
- Error Feedback under $(L_0,L_1)$-Smoothness: Normalization and Momentum [56.37522020675243]
We provide the first proof of convergence for normalized error feedback algorithms across a wide range of machine learning problems.
We show that due to their larger allowable stepsizes, our new normalized error feedback algorithms outperform their non-normalized counterparts on various tasks.
arXiv Detail & Related papers (2024-10-22T10:19:27Z) - Equidistribution-based training of Free Knot Splines and ReLU Neural Networks [0.0]
We show that the $L$ based approximation problem is ill-conditioned using shallow neural networks (NNs) with a rectified linear unit (ReLU) activation function.
We propose a two-level procedure for training the FKS by first solving the nonlinear problem of finding the optimal knot locations.
We then determine the optimal weights and knots of the FKS by solving a nearly linear, well-conditioned problem.
arXiv Detail & Related papers (2024-07-02T10:51:36Z) - Modify Training Directions in Function Space to Reduce Generalization
Error [9.821059922409091]
We propose a modified natural gradient descent method in the neural network function space based on the eigendecompositions of neural tangent kernel and Fisher information matrix.
We explicitly derive the generalization error of the learned neural network function using theoretical methods from eigendecomposition and statistics theory.
arXiv Detail & Related papers (2023-07-25T07:11:30Z) - Benign Overfitting in Deep Neural Networks under Lazy Training [72.28294823115502]
We show that when the data distribution is well-separated, DNNs can achieve Bayes-optimal test error for classification.
Our results indicate that interpolating with smoother functions leads to better generalization.
arXiv Detail & Related papers (2023-05-30T19:37:44Z) - Generalization Analysis for Contrastive Representation Learning [80.89690821916653]
Existing generalization error bounds depend linearly on the number $k$ of negative examples.
We establish novel generalization bounds for contrastive learning which do not depend on $k$, up to logarithmic terms.
arXiv Detail & Related papers (2023-02-24T01:03:56Z) - Do highly over-parameterized neural networks generalize since bad
solutions are rare? [0.0]
Empirical Risk Minimization (ERM) for learning leads to zero training error.
We show that under certain conditions the fraction of "bad" global minima with a true error larger than epsilon decays to zero exponentially fast with the number of training data n.
arXiv Detail & Related papers (2022-11-07T14:02:07Z) - Instance-Dependent Generalization Bounds via Optimal Transport [51.71650746285469]
Existing generalization bounds fail to explain crucial factors that drive the generalization of modern neural networks.
We derive instance-dependent generalization bounds that depend on the local Lipschitz regularity of the learned prediction function in the data space.
We empirically analyze our generalization bounds for neural networks, showing that the bound values are meaningful and capture the effect of popular regularization methods during training.
arXiv Detail & Related papers (2022-11-02T16:39:42Z) - Predicting Unreliable Predictions by Shattering a Neural Network [145.3823991041987]
Piecewise linear neural networks can be split into subfunctions.
Subfunctions have their own activation pattern, domain, and empirical error.
Empirical error for the full network can be written as an expectation over subfunctions.
arXiv Detail & Related papers (2021-06-15T18:34:41Z) - Double Descent and Other Interpolation Phenomena in GANs [2.7007335372861974]
We study the generalization error as a function of latent space dimension in generative adversarial networks (GANs)
We develop a novel pseudo-supervised learning approach for GANs where the training utilizes pairs of fabricated (noise) inputs in conjunction with real output samples.
While our analysis focuses mostly on linear models, we also apply important insights for improving generalization of nonlinear, multilayer GANs.
arXiv Detail & Related papers (2021-06-07T23:07:57Z) - Understanding and Mitigating the Tradeoff Between Robustness and
Accuracy [88.51943635427709]
Adversarial training augments the training set with perturbations to improve the robust error.
We show that the standard error could increase even when the augmented perturbations have noiseless observations from the optimal linear predictor.
arXiv Detail & Related papers (2020-02-25T08:03:01Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.