Deep Equals Shallow for ReLU Networks in Kernel Regimes
- URL: http://arxiv.org/abs/2009.14397v4
- Date: Thu, 26 Aug 2021 15:49:40 GMT
- Title: Deep Equals Shallow for ReLU Networks in Kernel Regimes
- Authors: Alberto Bietti, Francis Bach
- Abstract summary: We show that for ReLU activations, the kernels derived from deep fully-connected networks have essentially the same approximation properties as their shallow two-layer counterpart.
Our main theoretical result relies on characterizing such eigenvalue decays through differentiability properties of the kernel function.
- Score: 13.909388235627791
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Deep networks are often considered to be more expressive than shallow ones in
terms of approximation. Indeed, certain functions can be approximated by deep
networks provably more efficiently than by shallow ones, however, no tractable
algorithms are known for learning such deep models. Separately, a recent line
of work has shown that deep networks trained with gradient descent may behave
like (tractable) kernel methods in a certain over-parameterized regime, where
the kernel is determined by the architecture and initialization, and this paper
focuses on approximation for such kernels. We show that for ReLU activations,
the kernels derived from deep fully-connected networks have essentially the
same approximation properties as their shallow two-layer counterpart, namely
the same eigenvalue decay for the corresponding integral operator. This
highlights the limitations of the kernel framework for understanding the
benefits of such deep architectures. Our main theoretical result relies on
characterizing such eigenvalue decays through differentiability properties of
the kernel function, which also easily applies to the study of other kernels
defined on the sphere.
Related papers
- Approximation by non-symmetric networks for cross-domain learning [0.0]
We study the approximation capabilities of kernel based networks using non-symmetric kernels.
We obtain estimates on the accuracy of uniform approximation of functions in a Sobolev class by ReLU$r$ networks when $r$ is not necessarily an integer.
arXiv Detail & Related papers (2023-05-06T01:33:26Z) - On the Eigenvalue Decay Rates of a Class of Neural-Network Related
Kernel Functions Defined on General Domains [10.360517127652185]
We provide a strategy to determine the eigenvalue decay rate (EDR) of a large class of kernel functions defined on a general domain.
This class of kernel functions include but are not limited to the neural tangent kernel associated with neural networks with different depths and various activation functions.
arXiv Detail & Related papers (2023-05-04T08:54:40Z) - Unified Field Theory for Deep and Recurrent Neural Networks [56.735884560668985]
We present a unified and systematic derivation of the mean-field theory for both recurrent and deep networks.
We find that convergence towards the mean-field theory is typically slower for recurrent networks than for deep networks.
Our method exposes that Gaussian processes are but the lowest order of a systematic expansion in $1/n$.
arXiv Detail & Related papers (2021-12-10T15:06:11Z) - Optimization-Based Separations for Neural Networks [57.875347246373956]
We show that gradient descent can efficiently learn ball indicator functions using a depth 2 neural network with two layers of sigmoidal activations.
This is the first optimization-based separation result where the approximation benefits of the stronger architecture provably manifest in practice.
arXiv Detail & Related papers (2021-12-04T18:07:47Z) - Universality and Optimality of Structured Deep Kernel Networks [0.0]
Kernel based methods yield approximation models that are flexible, efficient and powerful.
Recent success of machine learning methods has been driven by deep neural networks (NNs)
In this paper, we show that the use of special types of kernels yield models reminiscent of neural networks.
arXiv Detail & Related papers (2021-05-15T14:10:35Z) - Random Features for the Neural Tangent Kernel [57.132634274795066]
We propose an efficient feature map construction of the Neural Tangent Kernel (NTK) of fully-connected ReLU network.
We show that dimension of the resulting features is much smaller than other baseline feature map constructions to achieve comparable error bounds both in theory and practice.
arXiv Detail & Related papers (2021-04-03T09:08:12Z) - On Approximation in Deep Convolutional Networks: a Kernel Perspective [12.284934135116515]
We study the success of deep convolutional networks on tasks involving high-dimensional data such as images or audio.
We study this theoretically and empirically through the lens of kernel methods, by considering multi-layer convolutional kernels.
We find that while expressive kernels operating on input patches are important at the first layer, simpler kernels can suffice in higher layers for good performance.
arXiv Detail & Related papers (2021-02-19T17:03:42Z) - The Connection Between Approximation, Depth Separation and Learnability
in Neural Networks [70.55686685872008]
We study the connection between learnability and approximation capacity.
We show that learnability with deep networks of a target function depends on the ability of simpler classes to approximate the target.
arXiv Detail & Related papers (2021-01-31T11:32:30Z) - Kernelized Classification in Deep Networks [49.47339560731506]
We propose a kernelized classification layer for deep networks.
We advocate a nonlinear classification layer by using the kernel trick on the softmax cross-entropy loss function during training.
We show the usefulness of the proposed nonlinear classification layer on several datasets and tasks.
arXiv Detail & Related papers (2020-12-08T21:43:19Z) - Finite Versus Infinite Neural Networks: an Empirical Study [69.07049353209463]
kernel methods outperform fully-connected finite-width networks.
Centered and ensembled finite networks have reduced posterior variance.
Weight decay and the use of a large learning rate break the correspondence between finite and infinite networks.
arXiv Detail & Related papers (2020-07-31T01:57:47Z) - Avoiding Kernel Fixed Points: Computing with ELU and GELU Infinite
Networks [12.692279981822011]
We derive the covariance functions of multi-layer perceptrons with exponential linear units (ELU) and Gaussian error linear units (GELU)
We analyse the fixed-point dynamics of iterated kernels corresponding to a broad range of activation functions.
We find that unlike some previously studied neural network kernels, these new kernels exhibit non-trivial fixed-point dynamics.
arXiv Detail & Related papers (2020-02-20T01:25:39Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.