Related papers: Function-Space Optimality of Neural Architectures With Multivariate Nonlinearities

Function-Space Optimality of Neural Architectures With Multivariate Nonlinearities

URL: http://arxiv.org/abs/2310.03696v2
Date: Wed, 6 Dec 2023 12:30:45 GMT
Title: Function-Space Optimality of Neural Architectures With Multivariate Nonlinearities
Authors: Rahul Parhi and Michael Unser
Abstract summary: We prove a representer theorem that states that the solution sets to learning problems posed over Banach spaces are completely characterized by neural architectures with nonlinearities. Our results shed light on the regularity of functions learned by neural networks trained on data, and provide new theoretical motivation for several architectural choices found in practice.
Score: 30.762063524541638
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: We investigate the function-space optimality (specifically, the Banach-space optimality) of a large class of shallow neural architectures with multivariate nonlinearities/activation functions. To that end, we construct a new family of Banach spaces defined via a regularization operator, the $k$-plane transform, and a sparsity-promoting norm. We prove a representer theorem that states that the solution sets to learning problems posed over these Banach spaces are completely characterized by neural architectures with multivariate nonlinearities. These optimal architectures have skip connections and are tightly connected to orthogonal weight normalization and multi-index models, both of which have received recent interest in the neural network community. Our framework is compatible with a number of classical nonlinearities including the rectified linear unit (ReLU) activation function, the norm activation function, and the radial basis functions found in the theory of thin-plate/polyharmonic splines. We also show that the underlying spaces are special instances of reproducing kernel Banach spaces and variation spaces. Our results shed light on the regularity of functions learned by neural networks trained on data, particularly with multivariate nonlinearities, and provide new theoretical motivation for several architectural choices found in practice.

Related papers

Random Matrix Theory for Deep Learning: Beyond Eigenvalues of Linear Models [51.85815025140659]
Modern Machine Learning (ML) and Deep Neural Networks (DNNs) often operate on high-dimensional data.<n>In particular, the proportional regime where the data dimension, sample size, and number of model parameters are all large gives rise to novel and sometimes counterintuitive behaviors.<n>This paper extends traditional Random Matrix Theory (RMT) beyond eigenvalue-based analysis of linear models to address the challenges posed by nonlinear ML models.
arXiv Detail & Related papers (2025-06-16T06:54:08Z)
Equivariant non-linear maps for neural networks on homogeneous spaces [8.944149301388551]
We present a novel framework for non-linear equivariant neural network layers on homogeneous spaces. We derive generalized steerability constraints that any such layer needs to satisfy. We demonstrate how several common equivariant network architectures may be derived from our framework.
arXiv Detail & Related papers (2025-04-29T17:42:56Z)
An Adaptive Tangent Feature Perspective of Neural Networks [4.900298402690262]
We consider linear transformations of features, resulting in a joint optimization over parameters and transformations with a bilinear constraint. Specializing to neural network structure, we gain insights into how the features and thus the kernel function change. We verify our theoretical observations in the kernel alignment of real neural networks.
arXiv Detail & Related papers (2023-08-29T17:57:20Z)
Designing Universal Causal Deep Learning Models: The Case of Infinite-Dimensional Dynamical Systems from Stochastic Analysis [7.373617024876726]
Several non-linear operators in analysis depend on a temporal structure which is not leveraged by contemporary neural operators. This paper introduces a deep learning model-design framework that takes suitable infinite-dimensional linear metric spaces. We show that our framework can uniformly approximate on compact sets and across arbitrarily finite-time horizons H" or smooth trace class operators.
arXiv Detail & Related papers (2022-10-24T14:43:03Z)
Exploring Linear Feature Disentanglement For Neural Networks [63.20827189693117]
Non-linear activation functions, e.g., Sigmoid, ReLU, and Tanh, have achieved great success in neural networks (NNs) Due to the complex non-linear characteristic of samples, the objective of those activation functions is to project samples from their original feature space to a linear separable feature space. This phenomenon ignites our interest in exploring whether all features need to be transformed by all non-linear functions in current typical NNs.
arXiv Detail & Related papers (2022-03-22T13:09:17Z)
Convolutional Filtering and Neural Networks with Non Commutative Algebras [153.20329791008095]
We study the generalization of non commutative convolutional neural networks. We show that non commutative convolutional architectures can be stable to deformations on the space of operators.
arXiv Detail & Related papers (2021-08-23T04:22:58Z)
What Kinds of Functions do Deep Neural Networks Learn? Insights from Variational Spline Theory [19.216784367141972]
We develop a variational framework to understand the properties of functions learned by deep neural networks with ReLU activation functions fit to data. We derive a representer theorem showing that deep ReLU networks are solutions to regularized data fitting problems in this function space.
arXiv Detail & Related papers (2021-05-07T16:18:22Z)
Bilinear Classes: A Structural Framework for Provable Generalization in RL [119.42509700822484]
Bilinear Classes is a new structural framework which permits generalization in reinforcement learning. The framework incorporates nearly all existing models in which a sample complexity is achievable. Our main result provides an RL algorithm which has sample complexity for Bilinear Classes.
arXiv Detail & Related papers (2021-03-19T16:34:20Z)
A Differential Geometry Perspective on Orthogonal Recurrent Models [56.09491978954866]
We employ tools and insights from differential geometry to offer a novel perspective on orthogonal RNNs. We show that orthogonal RNNs may be viewed as optimizing in the space of divergence-free vector fields. Motivated by this observation, we study a new recurrent model, which spans the entire space of vector fields.
arXiv Detail & Related papers (2021-02-18T19:39:22Z)
On the Number of Linear Functions Composing Deep Neural Network: Towards a Refined Definition of Neural Networks Complexity [6.252236971703546]
We introduce an equivalence relation among the linear functions composing a piecewise linear function and then count those linear functions relative to that equivalence relation. Our new complexity measure can clearly distinguish between the two models, is consistent with the classical measure, and increases exponentially with depth.
arXiv Detail & Related papers (2020-10-23T01:46:12Z)
Provably Efficient Neural Estimation of Structural Equation Model: An Adversarial Approach [144.21892195917758]
We study estimation in a class of generalized Structural equation models (SEMs) We formulate the linear operator equation as a min-max game, where both players are parameterized by neural networks (NNs), and learn the parameters of these neural networks using a gradient descent. For the first time we provide a tractable estimation procedure for SEMs based on NNs with provable convergence and without the need for sample splitting.
arXiv Detail & Related papers (2020-07-02T17:55:47Z)
Multipole Graph Neural Operator for Parametric Partial Differential Equations [57.90284928158383]
One of the main challenges in using deep learning-based methods for simulating physical systems is formulating physics-based data. We propose a novel multi-level graph neural network framework that captures interaction at all ranges with only linear complexity. Experiments confirm our multi-graph network learns discretization-invariant solution operators to PDEs and can be evaluated in linear time.
arXiv Detail & Related papers (2020-06-16T21:56:22Z)

This list is automatically generated from the titles and abstracts of the papers in this site.