From Kernel Methods to Neural Networks: A Unifying Variational
Formulation
- URL: http://arxiv.org/abs/2206.14625v1
- Date: Wed, 29 Jun 2022 13:13:53 GMT
- Title: From Kernel Methods to Neural Networks: A Unifying Variational
Formulation
- Authors: Michael Unser
- Abstract summary: We present a unifying regularization functional that depends on an operator and on a generic Radon-domain norm.
Our framework offers guarantees of universal approximation for a broad family of regularization operators or, equivalently, for a wide variety of shallow neural networks.
- Score: 25.6264886382888
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: The minimization of a data-fidelity term and an additive regularization
functional gives rise to a powerful framework for supervised learning. In this
paper, we present a unifying regularization functional that depends on an
operator and on a generic Radon-domain norm. We establish the existence of a
minimizer and give the parametric form of the solution(s) under very mild
assumptions. When the norm is Hilbertian, the proposed formulation yields a
solution that involves radial-basis functions and is compatible with the
classical methods of machine learning. By contrast, for the total-variation
norm, the solution takes the form of a two-layer neural network with an
activation function that is determined by the regularization operator. In
particular, we retrieve the popular ReLU networks by letting the operator be
the Laplacian. We also characterize the solution for the intermediate
regularization norms $\|\cdot\|=\|\cdot\|_{L_p}$ with $p\in(1,2]$. Our
framework offers guarantees of universal approximation for a broad family of
regularization operators or, equivalently, for a wide variety of shallow neural
networks, including the cases (such as ReLU) where the activation function is
increasing polynomially. It also explains the favorable role of bias and skip
connections in neural architectures.
Related papers
- Generalization Bounds and Model Complexity for Kolmogorov-Arnold Networks [1.5850926890180461]
Kolmogorov-Arnold Network (KAN) is a network structure recently proposed by Liu et al.
Work provides a rigorous theoretical analysis of KAN by establishing generalization bounds for KAN equipped with activation functions.
arXiv Detail & Related papers (2024-10-10T15:23:21Z) - Stable Nonconvex-Nonconcave Training via Linear Interpolation [51.668052890249726]
This paper presents a theoretical analysis of linearahead as a principled method for stabilizing (large-scale) neural network training.
We argue that instabilities in the optimization process are often caused by the nonmonotonicity of the loss landscape and show how linear can help by leveraging the theory of nonexpansive operators.
arXiv Detail & Related papers (2023-10-20T12:45:12Z) - Benign Overfitting in Deep Neural Networks under Lazy Training [72.28294823115502]
We show that when the data distribution is well-separated, DNNs can achieve Bayes-optimal test error for classification.
Our results indicate that interpolating with smoother functions leads to better generalization.
arXiv Detail & Related papers (2023-05-30T19:37:44Z) - How (Implicit) Regularization of ReLU Neural Networks Characterizes the
Learned Function -- Part II: the Multi-D Case of Two Layers with Random First
Layer [2.1485350418225244]
We give an exact macroscopic characterization of the generalization behavior of randomized, shallow NNs with ReLU activation.
We show that RSNs correspond to a generalized additive model (GAM)-typed regression in which infinitely many directions are considered.
arXiv Detail & Related papers (2023-03-20T21:05:47Z) - A Lifted Bregman Formulation for the Inversion of Deep Neural Networks [28.03724379169264]
We propose a novel framework for the regularised inversion of deep neural networks.
The framework lifts the parameter space into a higher dimensional space by introducing auxiliary variables.
We present theoretical results and support their practical application with numerical examples.
arXiv Detail & Related papers (2023-03-01T20:30:22Z) - A Recursively Recurrent Neural Network (R2N2) Architecture for Learning
Iterative Algorithms [64.3064050603721]
We generalize Runge-Kutta neural network to a recurrent neural network (R2N2) superstructure for the design of customized iterative algorithms.
We demonstrate that regular training of the weight parameters inside the proposed superstructure on input/output data of various computational problem classes yields similar iterations to Krylov solvers for linear equation systems, Newton-Krylov solvers for nonlinear equation systems, and Runge-Kutta solvers for ordinary differential equations.
arXiv Detail & Related papers (2022-11-22T16:30:33Z) - NeuralEF: Deconstructing Kernels by Deep Neural Networks [47.54733625351363]
Traditional nonparametric solutions based on the Nystr"om formula suffer from scalability issues.
Recent work has resorted to a parametric approach, i.e., training neural networks to approximate the eigenfunctions.
We show that these problems can be fixed by using a new series of objective functions that generalizes to space of supervised and unsupervised learning problems.
arXiv Detail & Related papers (2022-04-30T05:31:07Z) - Provably Efficient Neural Estimation of Structural Equation Model: An
Adversarial Approach [144.21892195917758]
We study estimation in a class of generalized Structural equation models (SEMs)
We formulate the linear operator equation as a min-max game, where both players are parameterized by neural networks (NNs), and learn the parameters of these neural networks using a gradient descent.
For the first time we provide a tractable estimation procedure for SEMs based on NNs with provable convergence and without the need for sample splitting.
arXiv Detail & Related papers (2020-07-02T17:55:47Z) - Measuring Model Complexity of Neural Networks with Curve Activation
Functions [100.98319505253797]
We propose the linear approximation neural network (LANN) to approximate a given deep model with curve activation function.
We experimentally explore the training process of neural networks and detect overfitting.
We find that the $L1$ and $L2$ regularizations suppress the increase of model complexity.
arXiv Detail & Related papers (2020-06-16T07:38:06Z) - Solving high-dimensional eigenvalue problems using deep neural networks:
A diffusion Monte Carlo like approach [14.558626910178127]
The eigenvalue problem is reformulated as a fixed point problem of the semigroup flow induced by the operator.
The method shares a similar spirit with diffusion Monte Carlo but augments a direct approximation to the eigenfunction through neural-network ansatz.
Our approach is able to provide accurate eigenvalue and eigenfunction approximations in several numerical examples.
arXiv Detail & Related papers (2020-02-07T03:08:31Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.