Neural Optimization Kernel: Towards Robust Deep Learning
- URL: http://arxiv.org/abs/2106.06097v1
- Date: Fri, 11 Jun 2021 00:34:55 GMT
- Title: Neural Optimization Kernel: Towards Robust Deep Learning
- Authors: Yueming Lyu, Ivor Tsang
- Abstract summary: Recent studies show a connection between neural networks (NN) and kernel methods.
This paper proposes a novel kernel family named Kernel (NOK)
We show that over parameterized deep NN (NOK) can increase the expressive power to reduce empirical risk and reduce the bound generalization at the same time.
- Score: 13.147925376013129
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Recent studies show a close connection between neural networks (NN) and
kernel methods. However, most of these analyses (e.g., NTK) focus on the
influence of (infinite) width instead of the depth of NN models. There remains
a gap between theory and practical network designs that benefit from the depth.
This paper first proposes a novel kernel family named Neural Optimization
Kernel (NOK). Our kernel is defined as the inner product between two $T$-step
updated functionals in RKHS w.r.t. a regularized optimization problem.
Theoretically, we proved the monotonic descent property of our update rule for
both convex and non-convex problems, and a $O(1/T)$ convergence rate of our
updates for convex problems. Moreover, we propose a data-dependent structured
approximation of our NOK, which builds the connection between training deep NNs
and kernel methods associated with NOK. The resultant computational graph is a
ResNet-type finite width NN. Our structured approximation preserved the
monotonic descent property and $O(1/T)$ convergence rate. Namely, a $T$-layer
NN performs $T$-step monotonic descent updates. Notably, we show our
$T$-layered structured NN with ReLU maintains a $O(1/T)$ convergence rate
w.r.t. a convex regularized problem, which explains the success of ReLU on
training deep NN from a NN architecture optimization perspective. For the
unsupervised learning and the shared parameter case, we show the equivalence of
training structured NN with GD and performing functional gradient descent in
RKHS associated with a fixed (data-dependent) NOK at an infinity-width regime.
For finite NOKs, we prove generalization bounds. Remarkably, we show that
overparameterized deep NN (NOK) can increase the expressive power to reduce
empirical risk and reduce the generalization bound at the same time. Extensive
experiments verify the robustness of our structured NOK blocks.
Related papers
- Enhanced Feature Learning via Regularisation: Integrating Neural Networks and Kernel Methods [0.0]
We introduce an efficient method for the estimator, called Brownian Kernel Neural Network (BKerNN)
We show that BKerNN's expected risk converges to the minimal risk with explicit high-probability rates of $O( min((d/n)1/2, n-1/6)$ (up to logarithmic factors)
arXiv Detail & Related papers (2024-07-24T13:46:50Z) - Neural Networks for Singular Perturbations [0.0]
We prove expressivity rate bounds for solution sets of a model class of singularly perturbed, elliptic two-point boundary value problems.
We establish expression rate bounds in Sobolev norms in terms of the NN size.
arXiv Detail & Related papers (2024-01-12T16:02:18Z) - Stable Nonconvex-Nonconcave Training via Linear Interpolation [51.668052890249726]
This paper presents a theoretical analysis of linearahead as a principled method for stabilizing (large-scale) neural network training.
We argue that instabilities in the optimization process are often caused by the nonmonotonicity of the loss landscape and show how linear can help by leveraging the theory of nonexpansive operators.
arXiv Detail & Related papers (2023-10-20T12:45:12Z) - Benign Overfitting in Deep Neural Networks under Lazy Training [72.28294823115502]
We show that when the data distribution is well-separated, DNNs can achieve Bayes-optimal test error for classification.
Our results indicate that interpolating with smoother functions leads to better generalization.
arXiv Detail & Related papers (2023-05-30T19:37:44Z) - Neural Networks with Sparse Activation Induced by Large Bias: Tighter Analysis with Bias-Generalized NTK [86.45209429863858]
We study training one-hidden-layer ReLU networks in the neural tangent kernel (NTK) regime.
We show that the neural networks possess a different limiting kernel which we call textitbias-generalized NTK
We also study various properties of the neural networks with this new kernel.
arXiv Detail & Related papers (2023-01-01T02:11:39Z) - Neural tangent kernel analysis of shallow $\alpha$-Stable ReLU neural
networks [8.000374471991247]
We consider problems for $alpha$-Stable NNs, which generalize Gaussian NNs.
For shallow $alpha$-Stable NNs with a ReLU function, we show that if the NN's width goes to infinity then a rescaled NN converges weakly to an $alpha$-Stable process.
Our main contribution is the NTK analysis of shallow $alpha$-Stable ReLU-NNs, which leads to an equivalence between training a rescaled NN and performing a kernel regression with an $(alpha/
arXiv Detail & Related papers (2022-06-16T10:28:03Z) - On Feature Learning in Neural Networks with Global Convergence
Guarantees [49.870593940818715]
We study the optimization of wide neural networks (NNs) via gradient flow (GF)
We show that when the input dimension is no less than the size of the training set, the training loss converges to zero at a linear rate under GF.
We also show empirically that, unlike in the Neural Tangent Kernel (NTK) regime, our multi-layer model exhibits feature learning and can achieve better generalization performance than its NTK counterpart.
arXiv Detail & Related papers (2022-04-22T15:56:43Z) - Deep Learning meets Nonparametric Regression: Are Weight-Decayed DNNs Locally Adaptive? [16.105097124039602]
We study the theory of neural network (NN) from the lens of classical nonparametric regression problems.
Our research sheds new lights on why depth matters and how NNs are more powerful than kernel methods.
arXiv Detail & Related papers (2022-04-20T17:55:16Z) - Neural Contextual Bandits without Regret [47.73483756447701]
We propose algorithms for contextual bandits harnessing neural networks to approximate the unknown reward function.
We show that our approach converges to the optimal policy at a $tildemathcalO(T-1/2d)$ rate, where $d$ is the dimension of the context.
arXiv Detail & Related papers (2021-07-07T11:11:34Z) - Weighted Neural Tangent Kernel: A Generalized and Improved
Network-Induced Kernel [20.84988773171639]
The Neural Tangent Kernel (NTK) has recently attracted intense study, as it describes the evolution of an over- parameterized Neural Network (NN) trained by gradient descent.
We introduce the Weighted Neural Tangent Kernel (WNTK), a generalized and improved tool, which can capture an over- parameterized NN's training dynamics under different gradients.
With the proposed weight update algorithm, both empirical and analytical WNTKs outperform the corresponding NTKs in numerical experiments.
arXiv Detail & Related papers (2021-03-22T03:16:20Z) - Finite Versus Infinite Neural Networks: an Empirical Study [69.07049353209463]
kernel methods outperform fully-connected finite-width networks.
Centered and ensembled finite networks have reduced posterior variance.
Weight decay and the use of a large learning rate break the correspondence between finite and infinite networks.
arXiv Detail & Related papers (2020-07-31T01:57:47Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.