Large Deviations of Gaussian Neural Networks with ReLU activation
- URL: http://arxiv.org/abs/2405.16958v2
- Date: Fri, 01 Aug 2025 15:53:30 GMT
- Title: Large Deviations of Gaussian Neural Networks with ReLU activation
- Authors: Quirin Vogel,
- Abstract summary: We prove a large deviation principle for deep neural networks with Gaussian weights and at most linearly growing activation functions, such as ReLU.<n>We simplify previous expressions for the rate function and provide a power-series expansions for the ReLU case.
- Score: 0.0
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: We prove a large deviation principle for deep neural networks with Gaussian weights and at most linearly growing activation functions, such as ReLU. This generalises earlier work, in which bounded and continuous activation functions were considered. In practice, linearly growing activation functions such as ReLU are most commonly used. We furthermore simplify previous expressions for the rate function and provide a power-series expansions for the ReLU case.
Related papers
- Beyond ReLU: How Activations Affect Neural Kernels and Random Wide Networks [6.1003048508889535]
We provide a more general characterization of the RKHS for typical activation functions whose only non-smoothness is at zero.<n>Our results show that a broad class of not infinitely smooth activations generate equivalent tangents at different network depths, while activations generate non-equivalent RKHSs.
arXiv Detail & Related papers (2025-06-27T17:56:09Z) - A Near Complete Nonasymptotic Generalization Theory For Multilayer Neural Networks: Beyond the Bias-Variance Tradeoff [57.25901375384457]
We propose a nonasymptotic generalization theory for multilayer neural networks with arbitrary Lipschitz activations and general Lipschitz loss functions.
In particular, it doens't require the boundness of loss function, as commonly assumed in the literature.
We show the near minimax optimality of our theory for multilayer ReLU networks for regression problems.
arXiv Detail & Related papers (2025-03-03T23:34:12Z) - Improving the Expressive Power of Deep Neural Networks through Integral
Activation Transform [12.36064367319084]
We generalize the traditional fully connected deep neural network (DNN) through the concept of continuous width.
We show that IAT-ReLU exhibits a continuous activation pattern when continuous basis functions are employed.
Our numerical experiments demonstrate that IAT-ReLU outperforms regular ReLU in terms of trainability and better smoothness.
arXiv Detail & Related papers (2023-12-19T20:23:33Z) - Generalized Activation via Multivariate Projection [46.837481855573145]
Activation functions are essential to introduce nonlinearity into neural networks.
We consider ReLU as a projection from R onto the nonnegative half-line R+.
We extend ReLU by substituting it with a generalized projection operator onto a convex cone, such as the Second-Order Cone (SOC) projection.
arXiv Detail & Related papers (2023-09-29T12:44:27Z) - Promises and Pitfalls of the Linearized Laplace in Bayesian Optimization [73.80101701431103]
The linearized-Laplace approximation (LLA) has been shown to be effective and efficient in constructing Bayesian neural networks.
We study the usefulness of the LLA in Bayesian optimization and highlight its strong performance and flexibility.
arXiv Detail & Related papers (2023-04-17T14:23:43Z) - How (Implicit) Regularization of ReLU Neural Networks Characterizes the
Learned Function -- Part II: the Multi-D Case of Two Layers with Random First
Layer [2.1485350418225244]
We give an exact macroscopic characterization of the generalization behavior of randomized, shallow NNs with ReLU activation.
We show that RSNs correspond to a generalized additive model (GAM)-typed regression in which infinitely many directions are considered.
arXiv Detail & Related papers (2023-03-20T21:05:47Z) - Globally Optimal Training of Neural Networks with Threshold Activation
Functions [63.03759813952481]
We study weight decay regularized training problems of deep neural networks with threshold activations.
We derive a simplified convex optimization formulation when the dataset can be shattered at a certain layer of the network.
arXiv Detail & Related papers (2023-03-06T18:59:13Z) - On the existence of optimal shallow feedforward networks with ReLU
activation [0.0]
We prove existence of global minima in the loss landscape for the approximation of continuous target functions using shallow feedforward artificial neural networks with ReLU activation.
We propose a kind of closure of the search space so that in the extended space minimizers exist.
arXiv Detail & Related papers (2023-03-06T13:35:46Z) - Neural Networks with Sparse Activation Induced by Large Bias: Tighter Analysis with Bias-Generalized NTK [86.45209429863858]
We study training one-hidden-layer ReLU networks in the neural tangent kernel (NTK) regime.
We show that the neural networks possess a different limiting kernel which we call textitbias-generalized NTK
We also study various properties of the neural networks with this new kernel.
arXiv Detail & Related papers (2023-01-01T02:11:39Z) - On the Activation Function Dependence of the Spectral Bias of Neural
Networks [0.0]
We study the phenomenon from the point of view of the spectral bias of neural networks.
We provide a theoretical explanation for the spectral bias of ReLU neural networks by leveraging connections with the theory of finite element methods.
We show that neural networks with the Hat activation function are trained significantly faster using gradient descent and ADAM.
arXiv Detail & Related papers (2022-08-09T17:40:57Z) - Uniform Approximation with Quadratic Neural Networks [0.0]
We show that deep neural networks with ReQU activation can approximate any function within the (R)-H"older-regular functions.
Results can be straightforwardly generalized to any Rectified Power Unit (RePU) activation function of the form (max(0,x)p) for (pgeq 2)
arXiv Detail & Related papers (2022-01-11T02:26:55Z) - Adaptive Rational Activations to Boost Deep Reinforcement Learning [68.10769262901003]
We motivate why rationals are suitable for adaptable activation functions and why their inclusion into neural networks is crucial.
We demonstrate that equipping popular algorithms with (recurrent-)rational activations leads to consistent improvements on Atari games.
arXiv Detail & Related papers (2021-02-18T14:53:12Z) - Measuring Model Complexity of Neural Networks with Curve Activation
Functions [100.98319505253797]
We propose the linear approximation neural network (LANN) to approximate a given deep model with curve activation function.
We experimentally explore the training process of neural networks and detect overfitting.
We find that the $L1$ and $L2$ regularizations suppress the increase of model complexity.
arXiv Detail & Related papers (2020-06-16T07:38:06Z) - On the asymptotics of wide networks with polynomial activations [12.509746979383701]
We consider an existing conjecture addressing the behavior of neural networks in the large width limit.
We prove the conjecture for deep networks with activation functions.
We point out a difference in the behavior of networks with analytic (and non-linear) activation functions and those with piecewise activations such as ReLULU.
arXiv Detail & Related papers (2020-06-11T18:00:01Z) - Dynamic ReLU [74.973224160508]
We propose dynamic ReLU (DY-ReLU), a dynamic input of parameters which are generated by a hyper function over all in-put elements.
Compared to its static counterpart, DY-ReLU has negligible extra computational cost, but significantly more representation capability.
By simply using DY-ReLU for MobileNetV2, the top-1 accuracy on ImageNet classification is boosted from 72.0% to 76.2% with only 5% additional FLOPs.
arXiv Detail & Related papers (2020-03-22T23:45:35Z) - Deep Neural Networks with Trainable Activations and Controlled Lipschitz
Constant [26.22495169129119]
We introduce a variational framework to learn the activation functions of deep neural networks.
Our aim is to increase the capacity of the network while controlling an upper-bound of the Lipschitz constant.
We numerically compare our scheme with standard ReLU network and its variations, PReLU and LeakyReLU.
arXiv Detail & Related papers (2020-01-17T12:32:55Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.