How (Implicit) Regularization of ReLU Neural Networks Characterizes the
Learned Function -- Part II: the Multi-D Case of Two Layers with Random First
Layer
- URL: http://arxiv.org/abs/2303.11454v1
- Date: Mon, 20 Mar 2023 21:05:47 GMT
- Title: How (Implicit) Regularization of ReLU Neural Networks Characterizes the
Learned Function -- Part II: the Multi-D Case of Two Layers with Random First
Layer
- Authors: Jakob Heiss, Josef Teichmann, Hanna Wutte
- Abstract summary: We give an exact macroscopic characterization of the generalization behavior of randomized, shallow NNs with ReLU activation.
We show that RSNs correspond to a generalized additive model (GAM)-typed regression in which infinitely many directions are considered.
- Score: 2.1485350418225244
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Randomized neural networks (randomized NNs), where only the terminal layer's
weights are optimized constitute a powerful model class to reduce computational
time in training the neural network model. At the same time, these models
generalize surprisingly well in various regression and classification tasks. In
this paper, we give an exact macroscopic characterization (i.e., a
characterization in function space) of the generalization behavior of
randomized, shallow NNs with ReLU activation (RSNs). We show that RSNs
correspond to a generalized additive model (GAM)-typed regression in which
infinitely many directions are considered: the infinite generalized additive
model (IGAM). The IGAM is formalized as solution to an optimization problem in
function space for a specific regularization functional and a fairly general
loss. This work is an extension to multivariate NNs of prior work, where we
showed how wide RSNs with ReLU activation behave like spline regression under
certain conditions and if the input is one-dimensional.
Related papers
- Scaling and renormalization in high-dimensional regression [72.59731158970894]
This paper presents a succinct derivation of the training and generalization performance of a variety of high-dimensional ridge regression models.
We provide an introduction and review of recent results on these topics, aimed at readers with backgrounds in physics and deep learning.
arXiv Detail & Related papers (2024-05-01T15:59:00Z) - Universal Consistency of Wide and Deep ReLU Neural Networks and Minimax
Optimal Convergence Rates for Kolmogorov-Donoho Optimal Function Classes [7.433327915285969]
We prove the universal consistency of wide and deep ReLU neural network classifiers trained on the logistic loss.
We also give sufficient conditions for a class of probability measures for which classifiers based on neural networks achieve minimax optimal rates of convergence.
arXiv Detail & Related papers (2024-01-08T23:54:46Z) - Differentially Private Non-convex Learning for Multi-layer Neural
Networks [35.24835396398768]
This paper focuses on the problem of Differentially Private Tangent Optimization for (multi-layer) fully connected neural networks with a single output node.
By utilizing recent advances in Neural Kernel theory, we provide the first excess population risk when both the sample size and the width of the network are sufficiently large.
arXiv Detail & Related papers (2023-10-12T15:48:14Z) - Theoretical Characterization of the Generalization Performance of
Overfitted Meta-Learning [70.52689048213398]
This paper studies the performance of overfitted meta-learning under a linear regression model with Gaussian features.
We find new and interesting properties that do not exist in single-task linear regression.
Our analysis suggests that benign overfitting is more significant and easier to observe when the noise and the diversity/fluctuation of the ground truth of each training task are large.
arXiv Detail & Related papers (2023-04-09T20:36:13Z) - Smooth Mathematical Function from Compact Neural Networks [0.0]
We get NNs that generate highly accurate and highly smooth function, which only comprised of a few weight parameters.
New activation function, meta-batch method, features of numerical data, meta-augmentation with meta parameters are presented.
arXiv Detail & Related papers (2022-12-31T11:33:24Z) - Learning Low Dimensional State Spaces with Overparameterized Recurrent
Neural Nets [57.06026574261203]
We provide theoretical evidence for learning low-dimensional state spaces, which can also model long-term memory.
Experiments corroborate our theory, demonstrating extrapolation via learning low-dimensional state spaces with both linear and non-linear RNNs.
arXiv Detail & Related papers (2022-10-25T14:45:15Z) - Modeling from Features: a Mean-field Framework for Over-parameterized
Deep Neural Networks [54.27962244835622]
This paper proposes a new mean-field framework for over- parameterized deep neural networks (DNNs)
In this framework, a DNN is represented by probability measures and functions over its features in the continuous limit.
We illustrate the framework via the standard DNN and the Residual Network (Res-Net) architectures.
arXiv Detail & Related papers (2020-07-03T01:37:16Z) - Provably Efficient Neural Estimation of Structural Equation Model: An
Adversarial Approach [144.21892195917758]
We study estimation in a class of generalized Structural equation models (SEMs)
We formulate the linear operator equation as a min-max game, where both players are parameterized by neural networks (NNs), and learn the parameters of these neural networks using a gradient descent.
For the first time we provide a tractable estimation procedure for SEMs based on NNs with provable convergence and without the need for sample splitting.
arXiv Detail & Related papers (2020-07-02T17:55:47Z) - Measuring Model Complexity of Neural Networks with Curve Activation
Functions [100.98319505253797]
We propose the linear approximation neural network (LANN) to approximate a given deep model with curve activation function.
We experimentally explore the training process of neural networks and detect overfitting.
We find that the $L1$ and $L2$ regularizations suppress the increase of model complexity.
arXiv Detail & Related papers (2020-06-16T07:38:06Z) - How Implicit Regularization of ReLU Neural Networks Characterizes the
Learned Function -- Part I: the 1-D Case of Two Layers with Random First
Layer [5.969858080492586]
We consider one dimensional (shallow) ReLU neural networks in which weights are chosen randomly and only the terminal layer is trained.
We show that for such networks L2-regularized regression corresponds in function space to regularizing the estimate's second derivative for fairly general loss functionals.
arXiv Detail & Related papers (2019-11-07T13:48:15Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.