What Kinds of Functions do Deep Neural Networks Learn? Insights from
Variational Spline Theory
- URL: http://arxiv.org/abs/2105.03361v1
- Date: Fri, 7 May 2021 16:18:22 GMT
- Title: What Kinds of Functions do Deep Neural Networks Learn? Insights from
Variational Spline Theory
- Authors: Rahul Parhi, Robert D. Nowak
- Abstract summary: We develop a variational framework to understand the properties of functions learned by deep neural networks with ReLU activation functions fit to data.
We derive a representer theorem showing that deep ReLU networks are solutions to regularized data fitting problems in this function space.
- Score: 19.216784367141972
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: We develop a variational framework to understand the properties of functions
learned by deep neural networks with ReLU activation functions fit to data. We
propose a new function space, which is reminiscent of classical bounded
variation spaces, that captures the compositional structure associated with
deep neural networks. We derive a representer theorem showing that deep ReLU
networks are solutions to regularized data fitting problems in this function
space. The function space consists of compositions of functions from the
(non-reflexive) Banach spaces of second-order bounded variation in the Radon
domain. These are Banach spaces with sparsity-promoting norms, giving insight
into the role of sparsity in deep neural networks. The neural network solutions
have skip connections and rank bounded weight matrices, providing new
theoretical support for these common architectural choices. The variational
problem we study can be recast as a finite-dimensional neural network training
problem with regularization schemes related to the notions of weight decay and
path-norm regularization. Finally, our analysis builds on techniques from
variational spline theory, providing new connections between deep neural
networks and splines.
Related papers
- Novel Kernel Models and Exact Representor Theory for Neural Networks Beyond the Over-Parameterized Regime [52.00917519626559]
This paper presents two models of neural-networks and their training applicable to neural networks of arbitrary width, depth and topology.
We also present an exact novel representor theory for layer-wise neural network training with unregularized gradient descent in terms of a local-extrinsic neural kernel (LeNK)
This representor theory gives insight into the role of higher-order statistics in neural network training and the effect of kernel evolution in neural-network kernel models.
arXiv Detail & Related papers (2024-05-24T06:30:36Z) - Addressing caveats of neural persistence with deep graph persistence [54.424983583720675]
We find that the variance of network weights and spatial concentration of large weights are the main factors that impact neural persistence.
We propose an extension of the filtration underlying neural persistence to the whole neural network instead of single layers.
This yields our deep graph persistence measure, which implicitly incorporates persistent paths through the network and alleviates variance-related issues.
arXiv Detail & Related papers (2023-07-20T13:34:11Z) - Variation Spaces for Multi-Output Neural Networks: Insights on Multi-Task Learning and Network Compression [28.851519959657466]
This paper introduces a novel theoretical framework for the analysis of vector-valued neural networks.
A key contribution of this work is the development of a representer theorem for the vector-valued variation spaces.
This observation reveals that the norm associated with these vector-valued variation spaces encourages the learning of features that are useful for multiple tasks.
arXiv Detail & Related papers (2023-05-25T23:32:10Z) - Permutation Equivariant Neural Functionals [92.0667671999604]
This work studies the design of neural networks that can process the weights or gradients of other neural networks.
We focus on the permutation symmetries that arise in the weights of deep feedforward networks because hidden layer neurons have no inherent order.
In our experiments, we find that permutation equivariant neural functionals are effective on a diverse set of tasks.
arXiv Detail & Related papers (2023-02-27T18:52:38Z) - Gradient Descent in Neural Networks as Sequential Learning in RKBS [63.011641517977644]
We construct an exact power-series representation of the neural network in a finite neighborhood of the initial weights.
We prove that, regardless of width, the training sequence produced by gradient descent can be exactly replicated by regularized sequential learning.
arXiv Detail & Related papers (2023-02-01T03:18:07Z) - Extrapolation and Spectral Bias of Neural Nets with Hadamard Product: a
Polynomial Net Study [55.12108376616355]
The study on NTK has been devoted to typical neural network architectures, but is incomplete for neural networks with Hadamard products (NNs-Hp)
In this work, we derive the finite-width-K formulation for a special class of NNs-Hp, i.e., neural networks.
We prove their equivalence to the kernel regression predictor with the associated NTK, which expands the application scope of NTK.
arXiv Detail & Related papers (2022-09-16T06:36:06Z) - Approximation Power of Deep Neural Networks: an explanatory mathematical
survey [0.0]
The goal of this survey is to present an explanatory review of the approximation properties of deep neural networks.
We aim at understanding how and why deep neural networks outperform other classical linear and nonlinear approximation methods.
arXiv Detail & Related papers (2022-07-19T18:47:44Z) - Nonlocal Kernel Network (NKN): a Stable and Resolution-Independent Deep
Neural Network [23.465930256410722]
Nonlocal kernel network (NKN) is resolution independent, characterized by deep neural networks.
NKN is capable of handling a variety of tasks such as learning governing equations and classifying images.
arXiv Detail & Related papers (2022-01-06T19:19:35Z) - Near-Minimax Optimal Estimation With Shallow ReLU Neural Networks [19.216784367141972]
We study the problem of estimating an unknown function from noisy data using shallow (single-hidden layer) ReLU neural networks.
We quantify the performance of these neural network estimators when the data-generating function belongs to the space of functions of second-order bounded variation in the Radon domain.
arXiv Detail & Related papers (2021-09-18T05:56:06Z) - Learning Connectivity of Neural Networks from a Topological Perspective [80.35103711638548]
We propose a topological perspective to represent a network into a complete graph for analysis.
By assigning learnable parameters to the edges which reflect the magnitude of connections, the learning process can be performed in a differentiable manner.
This learning process is compatible with existing networks and owns adaptability to larger search spaces and different tasks.
arXiv Detail & Related papers (2020-08-19T04:53:31Z) - Banach Space Representer Theorems for Neural Networks and Ridge Splines [17.12783792226575]
We develop a variational framework to understand the properties of the functions learned by neural networks fit to data.
We derive a representer theorem showing that finite-width, single-hidden layer neural networks are solutions to inverse problems.
arXiv Detail & Related papers (2020-06-10T02:57:37Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.