On the existence of minimizers in shallow residual ReLU neural network optimization landscapes
- URL: http://arxiv.org/abs/2302.14690v2
- Date: Tue, 19 Nov 2024 14:16:38 GMT
- Title: On the existence of minimizers in shallow residual ReLU neural network optimization landscapes
- Authors: Steffen Dereich, Arnulf Jentzen, Sebastian Kassing,
- Abstract summary: We show existence of minimizers in the loss landscape for residual artificial neural networks (ANNs) with multi-dimensional input layer and one hidden layer with ReLU activation.
- Score: 3.6185342807265415
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: In this article, we show existence of minimizers in the loss landscape for residual artificial neural networks (ANNs) with multi-dimensional input layer and one hidden layer with ReLU activation. Our work contrasts earlier results in [D. Gallon, A. Jentzen, and F. Lindner, preprint, arXiv:2211.15641, 2022] and [P. Petersen, M. Raslan, and F. Voigtlaender, Found. Comput. Math., 21 (2021), pp. 375-444] which showed that in many situations minimizers do not exist for common smooth activation functions even in the case where the target functions are polynomials. The proof of the existence property makes use of a closure of the search space containing all functions generated by ANNs and additional discontinuous generalized responses. As we will show, the additional generalized responses in this larger space are suboptimal so that the minimum is attained in the original function class.
Related papers
- A Near Complete Nonasymptotic Generalization Theory For Multilayer Neural Networks: Beyond the Bias-Variance Tradeoff [57.25901375384457]
We propose a nonasymptotic generalization theory for multilayer neural networks with arbitrary Lipschitz activations and general Lipschitz loss functions.
In particular, it doens't require the boundness of loss function, as commonly assumed in the literature.
We show the near minimax optimality of our theory for multilayer ReLU networks for regression problems.
arXiv Detail & Related papers (2025-03-03T23:34:12Z) - Novel Quadratic Constraints for Extending LipSDP beyond Slope-Restricted
Activations [52.031701581294804]
Lipschitz bounds for neural networks can be computed with upper time preservation guarantees.
Our paper bridges the gap and extends Lipschitz beyond slope-restricted activation functions.
Our proposed analysis is general and provides a unified approach for estimating $ell$ and $ell_infty$ Lipschitz bounds.
arXiv Detail & Related papers (2024-01-25T09:23:31Z) - The Implicit Bias of Minima Stability in Multivariate Shallow ReLU
Networks [53.95175206863992]
We study the type of solutions to which gradient descent converges when used to train a single hidden-layer multivariate ReLU network with the quadratic loss.
We prove that although shallow ReLU networks are universal approximators, stable shallow networks are not.
arXiv Detail & Related papers (2023-06-30T09:17:39Z) - On the existence of optimal shallow feedforward networks with ReLU
activation [0.0]
We prove existence of global minima in the loss landscape for the approximation of continuous target functions using shallow feedforward artificial neural networks with ReLU activation.
We propose a kind of closure of the search space so that in the extended space minimizers exist.
arXiv Detail & Related papers (2023-03-06T13:35:46Z) - When Expressivity Meets Trainability: Fewer than $n$ Neurons Can Work [59.29606307518154]
We show that as long as the width $m geq 2n/d$ (where $d$ is the input dimension), its expressivity is strong, i.e., there exists at least one global minimizer with zero training loss.
We also consider a constrained optimization formulation where the feasible region is the nice local region, and prove that every KKT point is a nearly global minimizer.
arXiv Detail & Related papers (2022-10-21T14:41:26Z) - On the Omnipresence of Spurious Local Minima in Certain Neural Network
Training Problems [0.0]
We study the loss landscape of training problems for deep artificial neural networks with a one-dimensional real output.
It is shown that such problems possess a continuum of spurious (i.e., not globally optimal) local minima for all target functions that are not affine.
arXiv Detail & Related papers (2022-02-23T14:41:54Z) - Near-Minimax Optimal Estimation With Shallow ReLU Neural Networks [19.216784367141972]
We study the problem of estimating an unknown function from noisy data using shallow (single-hidden layer) ReLU neural networks.
We quantify the performance of these neural network estimators when the data-generating function belongs to the space of functions of second-order bounded variation in the Radon domain.
arXiv Detail & Related papers (2021-09-18T05:56:06Z) - Achieving Small Test Error in Mildly Overparameterized Neural Networks [30.664282759625948]
We show an algorithm which finds one of these points in time.
In addition, we prove that for a fully connected neural net, with an additional assumption on the data distribution, there is a time algorithm.
arXiv Detail & Related papers (2021-04-24T06:47:20Z) - On Function Approximation in Reinforcement Learning: Optimism in the
Face of Large State Spaces [208.67848059021915]
We study the exploration-exploitation tradeoff at the core of reinforcement learning.
In particular, we prove that the complexity of the function class $mathcalF$ characterizes the complexity of the function.
Our regret bounds are independent of the number of episodes.
arXiv Detail & Related papers (2020-11-09T18:32:22Z) - Interval Universal Approximation for Neural Networks [47.767793120249095]
We introduce the interval universal approximation (IUA) theorem.
IUA shows that neural networks can approximate any continuous function $f$ as we have known for decades.
We study the computational complexity of constructing neural networks that are amenable to precise interval analysis.
arXiv Detail & Related papers (2020-07-12T20:43:56Z) - Evolving Normalization-Activation Layers [100.82879448303805]
We develop efficient rejection protocols to quickly filter out candidate layers that do not work well.
Our method leads to the discovery of EvoNorms, a set of new normalization-activation layers with novel, and sometimes surprising structures.
Our experiments show that EvoNorms work well on image classification models including ResNets, MobileNets and EfficientNets.
arXiv Detail & Related papers (2020-04-06T19:52:48Z) - On Sharpness of Error Bounds for Multivariate Neural Network
Approximation [0.0]
The paper deals with best non-linear approximation by such sums of ridge functions.
Error bounds are presented in terms of moduli of smoothness.
arXiv Detail & Related papers (2020-04-05T14:00:52Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.