Universality and Optimality of Structured Deep Kernel Networks
- URL: http://arxiv.org/abs/2105.07228v1
- Date: Sat, 15 May 2021 14:10:35 GMT
- Title: Universality and Optimality of Structured Deep Kernel Networks
- Authors: Tizian Wenzel, Gabriele Santin, Bernard Haasdonk
- Abstract summary: Kernel based methods yield approximation models that are flexible, efficient and powerful.
Recent success of machine learning methods has been driven by deep neural networks (NNs)
In this paper, we show that the use of special types of kernels yield models reminiscent of neural networks.
- Score: 0.0
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Kernel based methods yield approximation models that are flexible, efficient
and powerful. In particular, they utilize fixed feature maps of the data, being
often associated to strong analytical results that prove their accuracy. On the
other hand, the recent success of machine learning methods has been driven by
deep neural networks (NNs). They achieve a significant accuracy on very
high-dimensional data, in that they are able to learn also efficient data
representations or data-based feature maps. In this paper, we leverage a recent
deep kernel representer theorem to connect the two approaches and understand
their interplay. In particular, we show that the use of special types of
kernels yield models reminiscent of neural networks that are founded in the
same theoretical framework of classical kernel methods, while enjoying many
computational properties of deep neural networks. Especially the introduced
Structured Deep Kernel Networks (SDKNs) can be viewed as neural networks with
optimizable activation functions obeying a representer theorem. Analytic
properties show their universal approximation properties in different
asymptotic regimes of unbounded number of centers, width and depth. Especially
in the case of unbounded depth, the constructions is asymptotically better than
corresponding constructions for ReLU neural networks, which is made possible by
the flexibility of kernel approximation
Related papers
- Convergence Analysis for Deep Sparse Coding via Convolutional Neural Networks [7.956678963695681]
We introduce a novel class of Deep Sparse Coding (DSC) models.
We derive convergence rates for CNNs in their ability to extract sparse features.
Inspired by the strong connection between sparse coding and CNNs, we explore training strategies to encourage neural networks to learn more sparse features.
arXiv Detail & Related papers (2024-08-10T12:43:55Z) - On the Approximation and Complexity of Deep Neural Networks to Invariant
Functions [0.0]
We study the approximation and complexity of deep neural networks to invariant functions.
We show that a broad range of invariant functions can be approximated by various types of neural network models.
We provide a feasible application that connects the parameter estimation and forecasting of high-resolution signals with our theoretical conclusions.
arXiv Detail & Related papers (2022-10-27T09:19:19Z) - Extrapolation and Spectral Bias of Neural Nets with Hadamard Product: a
Polynomial Net Study [55.12108376616355]
The study on NTK has been devoted to typical neural network architectures, but is incomplete for neural networks with Hadamard products (NNs-Hp)
In this work, we derive the finite-width-K formulation for a special class of NNs-Hp, i.e., neural networks.
We prove their equivalence to the kernel regression predictor with the associated NTK, which expands the application scope of NTK.
arXiv Detail & Related papers (2022-09-16T06:36:06Z) - Incorporating Prior Knowledge into Neural Networks through an Implicit
Composite Kernel [1.6383321867266318]
Implicit Composite Kernel (ICK) is a kernel that combines a kernel implicitly defined by a neural network with a second kernel function chosen to model known properties.
We demonstrate ICK's superior performance and flexibility on both synthetic and real-world data sets.
arXiv Detail & Related papers (2022-05-15T21:32:44Z) - Optimization-Based Separations for Neural Networks [57.875347246373956]
We show that gradient descent can efficiently learn ball indicator functions using a depth 2 neural network with two layers of sigmoidal activations.
This is the first optimization-based separation result where the approximation benefits of the stronger architecture provably manifest in practice.
arXiv Detail & Related papers (2021-12-04T18:07:47Z) - What can linearized neural networks actually say about generalization? [67.83999394554621]
In certain infinitely-wide neural networks, the neural tangent kernel (NTK) theory fully characterizes generalization.
We show that the linear approximations can indeed rank the learning complexity of certain tasks for neural networks.
Our work provides concrete examples of novel deep learning phenomena which can inspire future theoretical research.
arXiv Detail & Related papers (2021-06-12T13:05:11Z) - Learning Structures for Deep Neural Networks [99.8331363309895]
We propose to adopt the efficient coding principle, rooted in information theory and developed in computational neuroscience.
We show that sparse coding can effectively maximize the entropy of the output signals.
Our experiments on a public image classification dataset demonstrate that using the structure learned from scratch by our proposed algorithm, one can achieve a classification accuracy comparable to the best expert-designed structure.
arXiv Detail & Related papers (2021-05-27T12:27:24Z) - Random Features for the Neural Tangent Kernel [57.132634274795066]
We propose an efficient feature map construction of the Neural Tangent Kernel (NTK) of fully-connected ReLU network.
We show that dimension of the resulting features is much smaller than other baseline feature map constructions to achieve comparable error bounds both in theory and practice.
arXiv Detail & Related papers (2021-04-03T09:08:12Z) - Deep Equals Shallow for ReLU Networks in Kernel Regimes [13.909388235627791]
We show that for ReLU activations, the kernels derived from deep fully-connected networks have essentially the same approximation properties as their shallow two-layer counterpart.
Our main theoretical result relies on characterizing such eigenvalue decays through differentiability properties of the kernel function.
arXiv Detail & Related papers (2020-09-30T02:37:43Z) - Finite Versus Infinite Neural Networks: an Empirical Study [69.07049353209463]
kernel methods outperform fully-connected finite-width networks.
Centered and ensembled finite networks have reduced posterior variance.
Weight decay and the use of a large learning rate break the correspondence between finite and infinite networks.
arXiv Detail & Related papers (2020-07-31T01:57:47Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.