Function-Space Optimality of Neural Architectures With Multivariate
Nonlinearities
- URL: http://arxiv.org/abs/2310.03696v2
- Date: Wed, 6 Dec 2023 12:30:45 GMT
- Title: Function-Space Optimality of Neural Architectures With Multivariate
Nonlinearities
- Authors: Rahul Parhi and Michael Unser
- Abstract summary: We prove a representer theorem that states that the solution sets to learning problems posed over Banach spaces are completely characterized by neural architectures with nonlinearities.
Our results shed light on the regularity of functions learned by neural networks trained on data, and provide new theoretical motivation for several architectural choices found in practice.
- Score: 30.762063524541638
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: We investigate the function-space optimality (specifically, the Banach-space
optimality) of a large class of shallow neural architectures with multivariate
nonlinearities/activation functions. To that end, we construct a new family of
Banach spaces defined via a regularization operator, the $k$-plane transform,
and a sparsity-promoting norm. We prove a representer theorem that states that
the solution sets to learning problems posed over these Banach spaces are
completely characterized by neural architectures with multivariate
nonlinearities. These optimal architectures have skip connections and are
tightly connected to orthogonal weight normalization and multi-index models,
both of which have received recent interest in the neural network community.
Our framework is compatible with a number of classical nonlinearities including
the rectified linear unit (ReLU) activation function, the norm activation
function, and the radial basis functions found in the theory of
thin-plate/polyharmonic splines. We also show that the underlying spaces are
special instances of reproducing kernel Banach spaces and variation spaces. Our
results shed light on the regularity of functions learned by neural networks
trained on data, particularly with multivariate nonlinearities, and provide new
theoretical motivation for several architectural choices found in practice.
Related papers
- An Adaptive Tangent Feature Perspective of Neural Networks [4.900298402690262]
We consider linear transformations of features, resulting in a joint optimization over parameters and transformations with a bilinear constraint.
Specializing to neural network structure, we gain insights into how the features and thus the kernel function change.
We verify our theoretical observations in the kernel alignment of real neural networks.
arXiv Detail & Related papers (2023-08-29T17:57:20Z) - Exploring Linear Feature Disentanglement For Neural Networks [63.20827189693117]
Non-linear activation functions, e.g., Sigmoid, ReLU, and Tanh, have achieved great success in neural networks (NNs)
Due to the complex non-linear characteristic of samples, the objective of those activation functions is to project samples from their original feature space to a linear separable feature space.
This phenomenon ignites our interest in exploring whether all features need to be transformed by all non-linear functions in current typical NNs.
arXiv Detail & Related papers (2022-03-22T13:09:17Z) - Convolutional Filtering and Neural Networks with Non Commutative
Algebras [153.20329791008095]
We study the generalization of non commutative convolutional neural networks.
We show that non commutative convolutional architectures can be stable to deformations on the space of operators.
arXiv Detail & Related papers (2021-08-23T04:22:58Z) - What Kinds of Functions do Deep Neural Networks Learn? Insights from
Variational Spline Theory [19.216784367141972]
We develop a variational framework to understand the properties of functions learned by deep neural networks with ReLU activation functions fit to data.
We derive a representer theorem showing that deep ReLU networks are solutions to regularized data fitting problems in this function space.
arXiv Detail & Related papers (2021-05-07T16:18:22Z) - Bilinear Classes: A Structural Framework for Provable Generalization in
RL [119.42509700822484]
Bilinear Classes is a new structural framework which permits generalization in reinforcement learning.
The framework incorporates nearly all existing models in which a sample complexity is achievable.
Our main result provides an RL algorithm which has sample complexity for Bilinear Classes.
arXiv Detail & Related papers (2021-03-19T16:34:20Z) - A Differential Geometry Perspective on Orthogonal Recurrent Models [56.09491978954866]
We employ tools and insights from differential geometry to offer a novel perspective on orthogonal RNNs.
We show that orthogonal RNNs may be viewed as optimizing in the space of divergence-free vector fields.
Motivated by this observation, we study a new recurrent model, which spans the entire space of vector fields.
arXiv Detail & Related papers (2021-02-18T19:39:22Z) - On the Number of Linear Functions Composing Deep Neural Network: Towards
a Refined Definition of Neural Networks Complexity [6.252236971703546]
We introduce an equivalence relation among the linear functions composing a piecewise linear function and then count those linear functions relative to that equivalence relation.
Our new complexity measure can clearly distinguish between the two models, is consistent with the classical measure, and increases exponentially with depth.
arXiv Detail & Related papers (2020-10-23T01:46:12Z) - Provably Efficient Neural Estimation of Structural Equation Model: An
Adversarial Approach [144.21892195917758]
We study estimation in a class of generalized Structural equation models (SEMs)
We formulate the linear operator equation as a min-max game, where both players are parameterized by neural networks (NNs), and learn the parameters of these neural networks using a gradient descent.
For the first time we provide a tractable estimation procedure for SEMs based on NNs with provable convergence and without the need for sample splitting.
arXiv Detail & Related papers (2020-07-02T17:55:47Z) - Multipole Graph Neural Operator for Parametric Partial Differential
Equations [57.90284928158383]
One of the main challenges in using deep learning-based methods for simulating physical systems is formulating physics-based data.
We propose a novel multi-level graph neural network framework that captures interaction at all ranges with only linear complexity.
Experiments confirm our multi-graph network learns discretization-invariant solution operators to PDEs and can be evaluated in linear time.
arXiv Detail & Related papers (2020-06-16T21:56:22Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.