Related papers: From Kernel Methods to Neural Networks: A Unifying Variational Formulation

From Kernel Methods to Neural Networks: A Unifying Variational Formulation

URL: http://arxiv.org/abs/2206.14625v1
Date: Wed, 29 Jun 2022 13:13:53 GMT
Title: From Kernel Methods to Neural Networks: A Unifying Variational Formulation
Authors: Michael Unser
Abstract summary: We present a unifying regularization functional that depends on an operator and on a generic Radon-domain norm. Our framework offers guarantees of universal approximation for a broad family of regularization operators or, equivalently, for a wide variety of shallow neural networks.
Score: 25.6264886382888
License: http://creativecommons.org/licenses/by/4.0/
Abstract: The minimization of a data-fidelity term and an additive regularization functional gives rise to a powerful framework for supervised learning. In this paper, we present a unifying regularization functional that depends on an operator and on a generic Radon-domain norm. We establish the existence of a minimizer and give the parametric form of the solution(s) under very mild assumptions. When the norm is Hilbertian, the proposed formulation yields a solution that involves radial-basis functions and is compatible with the classical methods of machine learning. By contrast, for the total-variation norm, the solution takes the form of a two-layer neural network with an activation function that is determined by the regularization operator. In particular, we retrieve the popular ReLU networks by letting the operator be the Laplacian. We also characterize the solution for the intermediate regularization norms $\|\cdot\|=\|\cdot\|_{L_p}$ with $p\in(1,2]$. Our framework offers guarantees of universal approximation for a broad family of regularization operators or, equivalently, for a wide variety of shallow neural networks, including the cases (such as ReLU) where the activation function is increasing polynomially. It also explains the favorable role of bias and skip connections in neural architectures.

Related papers

The Spectral Bias of Shallow Neural Network Learning is Shaped by the Choice of Non-linearity [0.7499722271664144]
We study how non-linear activation functions contribute to shaping neural networks' implicit bias. We show that local dynamical attractors facilitate the formation of clusters of hyperplanes where the input to a neuron's activation function is zero.
arXiv Detail & Related papers (2025-03-13T17:36:46Z)
Generalization Bounds and Model Complexity for Kolmogorov-Arnold Networks [1.5850926890180461]
Kolmogorov-Arnold Network (KAN) is a network structure recently proposed by Liu et al. Work provides a rigorous theoretical analysis of KAN by establishing generalization bounds for KAN equipped with activation functions.
arXiv Detail & Related papers (2024-10-10T15:23:21Z)
Stable Nonconvex-Nonconcave Training via Linear Interpolation [51.668052890249726]
This paper presents a theoretical analysis of linearahead as a principled method for stabilizing (large-scale) neural network training. We argue that instabilities in the optimization process are often caused by the nonmonotonicity of the loss landscape and show how linear can help by leveraging the theory of nonexpansive operators.
arXiv Detail & Related papers (2023-10-20T12:45:12Z)
Benign Overfitting in Deep Neural Networks under Lazy Training [72.28294823115502]
We show that when the data distribution is well-separated, DNNs can achieve Bayes-optimal test error for classification. Our results indicate that interpolating with smoother functions leads to better generalization.
arXiv Detail & Related papers (2023-05-30T19:37:44Z)
How (Implicit) Regularization of ReLU Neural Networks Characterizes the Learned Function -- Part II: the Multi-D Case of Two Layers with Random First Layer [2.1485350418225244]
We give an exact macroscopic characterization of the generalization behavior of randomized, shallow NNs with ReLU activation. We show that RSNs correspond to a generalized additive model (GAM)-typed regression in which infinitely many directions are considered.
arXiv Detail & Related papers (2023-03-20T21:05:47Z)
A Lifted Bregman Formulation for the Inversion of Deep Neural Networks [28.03724379169264]
We propose a novel framework for the regularised inversion of deep neural networks. The framework lifts the parameter space into a higher dimensional space by introducing auxiliary variables. We present theoretical results and support their practical application with numerical examples.
arXiv Detail & Related papers (2023-03-01T20:30:22Z)
A Recursively Recurrent Neural Network (R2N2) Architecture for Learning Iterative Algorithms [64.3064050603721]
We generalize Runge-Kutta neural network to a recurrent neural network (R2N2) superstructure for the design of customized iterative algorithms. We demonstrate that regular training of the weight parameters inside the proposed superstructure on input/output data of various computational problem classes yields similar iterations to Krylov solvers for linear equation systems, Newton-Krylov solvers for nonlinear equation systems, and Runge-Kutta solvers for ordinary differential equations.
arXiv Detail & Related papers (2022-11-22T16:30:33Z)
NeuralEF: Deconstructing Kernels by Deep Neural Networks [47.54733625351363]
Traditional nonparametric solutions based on the Nystr"om formula suffer from scalability issues. Recent work has resorted to a parametric approach, i.e., training neural networks to approximate the eigenfunctions. We show that these problems can be fixed by using a new series of objective functions that generalizes to space of supervised and unsupervised learning problems.
arXiv Detail & Related papers (2022-04-30T05:31:07Z)
Provably Efficient Neural Estimation of Structural Equation Model: An Adversarial Approach [144.21892195917758]
We study estimation in a class of generalized Structural equation models (SEMs) We formulate the linear operator equation as a min-max game, where both players are parameterized by neural networks (NNs), and learn the parameters of these neural networks using a gradient descent. For the first time we provide a tractable estimation procedure for SEMs based on NNs with provable convergence and without the need for sample splitting.
arXiv Detail & Related papers (2020-07-02T17:55:47Z)
Measuring Model Complexity of Neural Networks with Curve Activation Functions [100.98319505253797]
We propose the linear approximation neural network (LANN) to approximate a given deep model with curve activation function. We experimentally explore the training process of neural networks and detect overfitting. We find that the $L1$ and $L2$ regularizations suppress the increase of model complexity.
arXiv Detail & Related papers (2020-06-16T07:38:06Z)
Solving high-dimensional eigenvalue problems using deep neural networks: A diffusion Monte Carlo like approach [14.558626910178127]
The eigenvalue problem is reformulated as a fixed point problem of the semigroup flow induced by the operator. The method shares a similar spirit with diffusion Monte Carlo but augments a direct approximation to the eigenfunction through neural-network ansatz. Our approach is able to provide accurate eigenvalue and eigenfunction approximations in several numerical examples.
arXiv Detail & Related papers (2020-02-07T03:08:31Z)

This list is automatically generated from the titles and abstracts of the papers in this site.