Related papers: On the Equivalence between Neural Network and Support Vector Machine

On the Equivalence between Neural Network and Support Vector Machine

URL: http://arxiv.org/abs/2111.06063v1
Date: Thu, 11 Nov 2021 06:05:00 GMT
Title: On the Equivalence between Neural Network and Support Vector Machine
Authors: Yilan Chen, Wei Huang, Lam M. Nguyen, Tsui-Wei Weng
Abstract summary: The dynamics of an infinitely wide neural network (NN) trained by gradient descent can be characterized by Tangent Neural Kernel (NTK) We establish the equivalence between NN and support vector machine (SVM) Our main theoretical results include establishing the equivalence between NN and a broad family of $ell$ regularized KMs with finite-width bounds.
Score: 23.174679357972984
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Recent research shows that the dynamics of an infinitely wide neural network (NN) trained by gradient descent can be characterized by Neural Tangent Kernel (NTK) \citep{jacot2018neural}. Under the squared loss, the infinite-width NN trained by gradient descent with an infinitely small learning rate is equivalent to kernel regression with NTK \citep{arora2019exact}. However, the equivalence is only known for ridge regression currently \citep{arora2019harnessing}, while the equivalence between NN and other kernel machines (KMs), e.g. support vector machine (SVM), remains unknown. Therefore, in this work, we propose to establish the equivalence between NN and SVM, and specifically, the infinitely wide NN trained by soft margin loss and the standard soft margin SVM with NTK trained by subgradient descent. Our main theoretical results include establishing the equivalence between NN and a broad family of $\ell_2$ regularized KMs with finite-width bounds, which cannot be handled by prior work, and showing that every finite-width NN trained by such regularized loss functions is approximately a KM. Furthermore, we demonstrate our theory can enable three practical applications, including (i) \textit{non-vacuous} generalization bound of NN via the corresponding KM; (ii) \textit{non-trivial} robustness certificate for the infinite-width NN (while existing robustness verification methods would provide vacuous bounds); (iii) intrinsically more robust infinite-width NNs than those from previous kernel regression. Our code for the experiments are available at \url{https://github.com/leslie-CH/equiv-nn-svm}.

Related papers

Neural Network Verification with Branch-and-Bound for General Nonlinearities [63.39918329535165]
Branch-and-bound (BaB) is among the most effective techniques for neural network (NN) verification. We develop a general framework, named GenBaB, to conduct BaB on general nonlinearities to verify NNs with general architectures. We demonstrate the effectiveness of our GenBaB on verifying a wide range of NNs, including NNs with activation functions such as Sigmoid, Tanh, Sine and GeLU.
arXiv Detail & Related papers (2024-05-31T17:51:07Z)
Benign Overfitting in Deep Neural Networks under Lazy Training [72.28294823115502]
We show that when the data distribution is well-separated, DNNs can achieve Bayes-optimal test error for classification. Our results indicate that interpolating with smoother functions leads to better generalization.
arXiv Detail & Related papers (2023-05-30T19:37:44Z)
Gradient Descent in Neural Networks as Sequential Learning in RKBS [63.011641517977644]
We construct an exact power-series representation of the neural network in a finite neighborhood of the initial weights. We prove that, regardless of width, the training sequence produced by gradient descent can be exactly replicated by regularized sequential learning.
arXiv Detail & Related papers (2023-02-01T03:18:07Z)
Neural tangent kernel analysis of shallow $\alpha$-Stable ReLU neural networks [8.000374471991247]
We consider problems for $alpha$-Stable NNs, which generalize Gaussian NNs. For shallow $alpha$-Stable NNs with a ReLU function, we show that if the NN's width goes to infinity then a rescaled NN converges weakly to an $alpha$-Stable process. Our main contribution is the NTK analysis of shallow $alpha$-Stable ReLU-NNs, which leads to an equivalence between training a rescaled NN and performing a kernel regression with an $(alpha/
arXiv Detail & Related papers (2022-06-16T10:28:03Z)
On Feature Learning in Neural Networks with Global Convergence Guarantees [49.870593940818715]
We study the optimization of wide neural networks (NNs) via gradient flow (GF) We show that when the input dimension is no less than the size of the training set, the training loss converges to zero at a linear rate under GF. We also show empirically that, unlike in the Neural Tangent Kernel (NTK) regime, our multi-layer model exhibits feature learning and can achieve better generalization performance than its NTK counterpart.
arXiv Detail & Related papers (2022-04-22T15:56:43Z)
Deep Learning meets Nonparametric Regression: Are Weight-Decayed DNNs Locally Adaptive? [16.105097124039602]
We study the theory of neural network (NN) from the lens of classical nonparametric regression problems. Our research sheds new lights on why depth matters and how NNs are more powerful than kernel methods.
arXiv Detail & Related papers (2022-04-20T17:55:16Z)
Deep Stable neural networks: large-width asymptotics and convergence rates [3.0108936184913295]
We show that as the width goes to infinity jointly over the NN's layers, a suitable rescaled deep Stable NN converges weakly to a Stable SP. Because of the non-triangular NN's structure, this is a non-standard problem, to which we propose a novel and self-contained inductive approach.
arXiv Detail & Related papers (2021-08-02T12:18:00Z)
Neural Optimization Kernel: Towards Robust Deep Learning [13.147925376013129]
Recent studies show a connection between neural networks (NN) and kernel methods. This paper proposes a novel kernel family named Kernel (NOK) We show that over parameterized deep NN (NOK) can increase the expressive power to reduce empirical risk and reduce the bound generalization at the same time.
arXiv Detail & Related papers (2021-06-11T00:34:55Z)
Weighted Neural Tangent Kernel: A Generalized and Improved Network-Induced Kernel [20.84988773171639]
The Neural Tangent Kernel (NTK) has recently attracted intense study, as it describes the evolution of an over- parameterized Neural Network (NN) trained by gradient descent. We introduce the Weighted Neural Tangent Kernel (WNTK), a generalized and improved tool, which can capture an over- parameterized NN's training dynamics under different gradients. With the proposed weight update algorithm, both empirical and analytical WNTKs outperform the corresponding NTKs in numerical experiments.
arXiv Detail & Related papers (2021-03-22T03:16:20Z)
A Generalized Neural Tangent Kernel Analysis for Two-layer Neural Networks [87.23360438947114]
We show that noisy gradient descent with weight decay can still exhibit a " Kernel-like" behavior. This implies that the training loss converges linearly up to a certain accuracy. We also establish a novel generalization error bound for two-layer neural networks trained by noisy gradient descent with weight decay.
arXiv Detail & Related papers (2020-02-10T18:56:15Z)
On Random Kernels of Residual Architectures [93.94469470368988]
We derive finite width and depth corrections for the Neural Tangent Kernel (NTK) of ResNets and DenseNets. Our findings show that in ResNets, convergence to the NTK may occur when depth and width simultaneously tend to infinity. In DenseNets, however, convergence of the NTK to its limit as the width tends to infinity is guaranteed.
arXiv Detail & Related papers (2020-01-28T16:47:53Z)

This list is automatically generated from the titles and abstracts of the papers in this site.