Kernel-Based Smoothness Analysis of Residual Networks
- URL: http://arxiv.org/abs/2009.10008v2
- Date: Sun, 23 May 2021 18:44:06 GMT
- Title: Kernel-Based Smoothness Analysis of Residual Networks
- Authors: Tom Tirer, Joan Bruna, Raja Giryes
- Abstract summary: Residual networks (ResNets) stand out among these powerful modern architectures.
In this paper, we show another distinction between the two models, namely, a tendency of ResNets to promote smoothers than gradients.
- Score: 85.20737467304994
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: A major factor in the success of deep neural networks is the use of
sophisticated architectures rather than the classical multilayer perceptron
(MLP). Residual networks (ResNets) stand out among these powerful modern
architectures. Previous works focused on the optimization advantages of deep
ResNets over deep MLPs. In this paper, we show another distinction between the
two models, namely, a tendency of ResNets to promote smoother interpolations
than MLPs. We analyze this phenomenon via the neural tangent kernel (NTK)
approach. First, we compute the NTK for a considered ResNet model and prove its
stability during gradient descent training. Then, we show by various evaluation
methodologies that for ReLU activations the NTK of ResNet, and its kernel
regression results, are smoother than the ones of MLP. The better smoothness
observed in our analysis may explain the better generalization ability of
ResNets and the practice of moderately attenuating the residual blocks.
Related papers
- Fixing the NTK: From Neural Network Linearizations to Exact Convex
Programs [63.768739279562105]
We show that for a particular choice of mask weights that do not depend on the learning targets, this kernel is equivalent to the NTK of the gated ReLU network on the training data.
A consequence of this lack of dependence on the targets is that the NTK cannot perform better than the optimal MKL kernel on the training set.
arXiv Detail & Related papers (2023-09-26T17:42:52Z) - SymNMF-Net for The Symmetric NMF Problem [62.44067422984995]
We propose a neural network called SymNMF-Net for the Symmetric NMF problem.
We show that the inference of each block corresponds to a single iteration of the optimization.
Empirical results on real-world datasets demonstrate the superiority of our SymNMF-Net.
arXiv Detail & Related papers (2022-05-26T08:17:39Z) - Comparative Analysis of Interval Reachability for Robust Implicit and
Feedforward Neural Networks [64.23331120621118]
We use interval reachability analysis to obtain robustness guarantees for implicit neural networks (INNs)
INNs are a class of implicit learning models that use implicit equations as layers.
We show that our approach performs at least as well as, and generally better than, applying state-of-the-art interval bound propagation methods to INNs.
arXiv Detail & Related papers (2022-04-01T03:31:27Z) - Singular Value Perturbation and Deep Network Optimization [29.204852309828006]
We develop new theoretical results on matrix perturbation to shed light on the impact of architecture on the performance of a deep network.
In particular, we explain what deep learning practitioners have long observed empirically: the parameters of some deep architectures are easier to optimize than others.
A direct application of our perturbation results explains analytically why a ResNet is easier to optimize than a ConvNet.
arXiv Detail & Related papers (2022-03-07T02:09:39Z) - m-RevNet: Deep Reversible Neural Networks with Momentum [25.609808975649624]
We propose a reversible neural network, termed as m-RevNet, that is characterized by inserting momentum update to residual blocks.
For certain learning scenarios, we analytically and empirically reveal that our m-RevNet succeeds while standard ResNet fails.
arXiv Detail & Related papers (2021-08-12T17:14:32Z) - Modeling from Features: a Mean-field Framework for Over-parameterized
Deep Neural Networks [54.27962244835622]
This paper proposes a new mean-field framework for over- parameterized deep neural networks (DNNs)
In this framework, a DNN is represented by probability measures and functions over its features in the continuous limit.
We illustrate the framework via the standard DNN and the Residual Network (Res-Net) architectures.
arXiv Detail & Related papers (2020-07-03T01:37:16Z) - Interpolation between Residual and Non-Residual Networks [24.690238357686134]
We present a novel ODE model by adding a damping term.
It can be shown that the proposed model can recover both a ResNet and a CNN by adjusting an coefficient.
Experiments on a number of image classification benchmarks show that the proposed model substantially improves the accuracy of ResNet and ResNeXt.
arXiv Detail & Related papers (2020-06-10T09:36:38Z) - Iterative Network for Image Super-Resolution [69.07361550998318]
Single image super-resolution (SISR) has been greatly revitalized by the recent development of convolutional neural networks (CNN)
This paper provides a new insight on conventional SISR algorithm, and proposes a substantially different approach relying on the iterative optimization.
A novel iterative super-resolution network (ISRN) is proposed on top of the iterative optimization.
arXiv Detail & Related papers (2020-05-20T11:11:47Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.