Kolmogorov Width Decay and Poor Approximators in Machine Learning:
Shallow Neural Networks, Random Feature Models and Neural Tangent Kernels
- URL: http://arxiv.org/abs/2005.10807v2
- Date: Fri, 2 Oct 2020 05:33:48 GMT
- Title: Kolmogorov Width Decay and Poor Approximators in Machine Learning:
Shallow Neural Networks, Random Feature Models and Neural Tangent Kernels
- Authors: Weinan E and Stephan Wojtowytsch
- Abstract summary: We establish a scale separation of Kolmogorov width type between subspaces of a given Banach space.
We show that reproducing kernel Hilbert spaces are poor $L2$-approximators for the class of two-layer neural networks in high dimension.
- Score: 8.160343645537106
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: We establish a scale separation of Kolmogorov width type between subspaces of
a given Banach space under the condition that a sequence of linear maps
converges much faster on one of the subspaces. The general technique is then
applied to show that reproducing kernel Hilbert spaces are poor
$L^2$-approximators for the class of two-layer neural networks in high
dimension, and that multi-layer networks with small path norm are poor
approximators for certain Lipschitz functions, also in the $L^2$-topology.
Related papers
- Novel Kernel Models and Exact Representor Theory for Neural Networks Beyond the Over-Parameterized Regime [52.00917519626559]
This paper presents two models of neural-networks and their training applicable to neural networks of arbitrary width, depth and topology.
We also present an exact novel representor theory for layer-wise neural network training with unregularized gradient descent in terms of a local-extrinsic neural kernel (LeNK)
This representor theory gives insight into the role of higher-order statistics in neural network training and the effect of kernel evolution in neural-network kernel models.
arXiv Detail & Related papers (2024-05-24T06:30:36Z) - Learning with Norm Constrained, Over-parameterized, Two-layer Neural Networks [54.177130905659155]
Recent studies show that a reproducing kernel Hilbert space (RKHS) is not a suitable space to model functions by neural networks.
In this paper, we study a suitable function space for over- parameterized two-layer neural networks with bounded norms.
arXiv Detail & Related papers (2024-04-29T15:04:07Z) - Data Topology-Dependent Upper Bounds of Neural Network Widths [52.58441144171022]
We first show that a three-layer neural network can be designed to approximate an indicator function over a compact set.
This is then extended to a simplicial complex, deriving width upper bounds based on its topological structure.
We prove the universal approximation property of three-layer ReLU networks using our topological approach.
arXiv Detail & Related papers (2023-05-25T14:17:15Z) - Gradient Descent in Neural Networks as Sequential Learning in RKBS [63.011641517977644]
We construct an exact power-series representation of the neural network in a finite neighborhood of the initial weights.
We prove that, regardless of width, the training sequence produced by gradient descent can be exactly replicated by regularized sequential learning.
arXiv Detail & Related papers (2023-02-01T03:18:07Z) - The Separation Capacity of Random Neural Networks [78.25060223808936]
We show that a sufficiently large two-layer ReLU-network with standard Gaussian weights and uniformly distributed biases can solve this problem with high probability.
We quantify the relevant structure of the data in terms of a novel notion of mutual complexity.
arXiv Detail & Related papers (2021-07-31T10:25:26Z) - Two-layer neural networks with values in a Banach space [1.90365714903665]
We study two-layer neural networks whose domain and range are Banach spaces with separable preduals.
As the nonlinearity we choose the lattice operation of taking the positive part; in case of $mathbb Rd$-valued neural networks this corresponds to the ReLU activation function.
arXiv Detail & Related papers (2021-05-05T14:54:24Z) - On the Banach spaces associated with multi-layer ReLU networks: Function
representation, approximation theory and gradient descent dynamics [8.160343645537106]
We develop Banach spaces for ReLU neural networks of finite depth $L$ and infinite width.
The spaces contain all finite fully connected $L$-layer networks and their $L2$-limiting objects under on the natural path-norm.
Under this norm, the unit ball in the space for $L$-layer networks has low Rademacher complexity and thus favorable properties.
arXiv Detail & Related papers (2020-07-30T17:47:05Z) - Theory of Deep Convolutional Neural Networks II: Spherical Analysis [9.099589602551573]
We consider a family of deep convolutional neural networks applied to approximate functions on the unit sphere $mathbbSd-1$ of $mathbbRd$.
Our analysis presents rates of uniform approximation when the approximated function lies in the Sobolev space $Wr_infty (mathbbSd-1)$ with $r>0$ or takes an additive ridge form.
arXiv Detail & Related papers (2020-07-28T14:54:30Z) - Neural Networks are Convex Regularizers: Exact Polynomial-time Convex
Optimization Formulations for Two-layer Networks [70.15611146583068]
We develop exact representations of training two-layer neural networks with rectified linear units (ReLUs)
Our theory utilizes semi-infinite duality and minimum norm regularization.
arXiv Detail & Related papers (2020-02-24T21:32:41Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.