On the Banach spaces associated with multi-layer ReLU networks: Function
representation, approximation theory and gradient descent dynamics
- URL: http://arxiv.org/abs/2007.15623v1
- Date: Thu, 30 Jul 2020 17:47:05 GMT
- Title: On the Banach spaces associated with multi-layer ReLU networks: Function
representation, approximation theory and gradient descent dynamics
- Authors: Weinan E and Stephan Wojtowytsch
- Abstract summary: We develop Banach spaces for ReLU neural networks of finite depth $L$ and infinite width.
The spaces contain all finite fully connected $L$-layer networks and their $L2$-limiting objects under on the natural path-norm.
Under this norm, the unit ball in the space for $L$-layer networks has low Rademacher complexity and thus favorable properties.
- Score: 8.160343645537106
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: We develop Banach spaces for ReLU neural networks of finite depth $L$ and
infinite width. The spaces contain all finite fully connected $L$-layer
networks and their $L^2$-limiting objects under bounds on the natural
path-norm. Under this norm, the unit ball in the space for $L$-layer networks
has low Rademacher complexity and thus favorable generalization properties.
Functions in these spaces can be approximated by multi-layer neural networks
with dimension-independent convergence rates.
The key to this work is a new way of representing functions in some form of
expectations, motivated by multi-layer neural networks. This representation
allows us to define a new class of continuous models for machine learning. We
show that the gradient flow defined this way is the natural continuous analog
of the gradient descent dynamics for the associated multi-layer neural
networks. We show that the path-norm increases at most polynomially under this
continuous gradient flow dynamics.
Related papers
- Piecewise Linear Functions Representable with Infinite Width Shallow
ReLU Neural Networks [0.0]
We prove a conjecture of Ongie et al. that every continuous piecewise linear function expressible with this kind of infinite width neural network is expressible as a finite width shallow ReLU neural network.
arXiv Detail & Related papers (2023-07-25T15:38:18Z) - Gradient Descent in Neural Networks as Sequential Learning in RKBS [63.011641517977644]
We construct an exact power-series representation of the neural network in a finite neighborhood of the initial weights.
We prove that, regardless of width, the training sequence produced by gradient descent can be exactly replicated by regularized sequential learning.
arXiv Detail & Related papers (2023-02-01T03:18:07Z) - The Onset of Variance-Limited Behavior for Networks in the Lazy and Rich
Regimes [75.59720049837459]
We study the transition from infinite-width behavior to this variance limited regime as a function of sample size $P$ and network width $N$.
We find that finite-size effects can become relevant for very small datasets on the order of $P* sim sqrtN$ for regression with ReLU networks.
arXiv Detail & Related papers (2022-12-23T04:48:04Z) - A Functional-Space Mean-Field Theory of Partially-Trained Three-Layer
Neural Networks [49.870593940818715]
We study the infinite-width limit of a type of three-layer NN model whose first layer is random and fixed.
Our theory accommodates different scaling choices of the model, resulting in two regimes of the MF limit that demonstrate distinctive behaviors.
arXiv Detail & Related papers (2022-10-28T17:26:27Z) - Neural Network Approximation of Continuous Functions in High Dimensions
with Applications to Inverse Problems [6.84380898679299]
Current theory predicts that networks should scale exponentially in the dimension of the problem.
We provide a general method for bounding the complexity required for a neural network to approximate a H"older (or uniformly) continuous function.
arXiv Detail & Related papers (2022-08-28T22:44:07Z) - On the Effective Number of Linear Regions in Shallow Univariate ReLU
Networks: Convergence Guarantees and Implicit Bias [50.84569563188485]
We show that gradient flow converges in direction when labels are determined by the sign of a target network with $r$ neurons.
Our result may already hold for mild over- parameterization, where the width is $tildemathcalO(r)$ and independent of the sample size.
arXiv Detail & Related papers (2022-05-18T16:57:10Z) - A global convergence theory for deep ReLU implicit networks via
over-parameterization [26.19122384935622]
Implicit deep learning has received increasing attention recently.
This paper analyzes the gradient flow of Rectified Linear Unit (ReLU) activated implicit neural networks.
arXiv Detail & Related papers (2021-10-11T23:22:50Z) - Deep neural network approximation of analytic functions [91.3755431537592]
entropy bound for the spaces of neural networks with piecewise linear activation functions.
We derive an oracle inequality for the expected error of the considered penalized deep neural network estimators.
arXiv Detail & Related papers (2021-04-05T18:02:04Z) - Large-width functional asymptotics for deep Gaussian neural networks [2.7561479348365734]
We consider fully connected feed-forward deep neural networks where weights and biases are independent and identically distributed according to Gaussian distributions.
Our results contribute to recent theoretical studies on the interplay between infinitely wide deep neural networks and processes.
arXiv Detail & Related papers (2021-02-20T10:14:37Z) - Theory of Deep Convolutional Neural Networks II: Spherical Analysis [9.099589602551573]
We consider a family of deep convolutional neural networks applied to approximate functions on the unit sphere $mathbbSd-1$ of $mathbbRd$.
Our analysis presents rates of uniform approximation when the approximated function lies in the Sobolev space $Wr_infty (mathbbSd-1)$ with $r>0$ or takes an additive ridge form.
arXiv Detail & Related papers (2020-07-28T14:54:30Z) - Multipole Graph Neural Operator for Parametric Partial Differential
Equations [57.90284928158383]
One of the main challenges in using deep learning-based methods for simulating physical systems is formulating physics-based data.
We propose a novel multi-level graph neural network framework that captures interaction at all ranges with only linear complexity.
Experiments confirm our multi-graph network learns discretization-invariant solution operators to PDEs and can be evaluated in linear time.
arXiv Detail & Related papers (2020-06-16T21:56:22Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.