Related papers: Stability of Deep Neural Networks via discrete rough paths

Stability of Deep Neural Networks via discrete rough paths

URL: http://arxiv.org/abs/2201.07566v1
Date: Wed, 19 Jan 2022 12:40:28 GMT
Title: Stability of Deep Neural Networks via discrete rough paths
Authors: Christian Bayer, Peter K. Friz, Nikolas Tapia
Abstract summary: We provide a priori estimates for the output of Deep Residual Neural Networks in terms of both the input data and the trained network weights. We interpret residual neural network as solutions to (rough) difference equations, and analyse them based on recent results of discrete time signatures and rough path theory.
Score: 0.0
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Using rough path techniques, we provide a priori estimates for the output of Deep Residual Neural Networks in terms of both the input data and the (trained) network weights. As trained network weights are typically very rough when seen as functions of the layer, we propose to derive stability bounds in terms of the total $p$-variation of trained weights for any $p\in[1,3]$. Unlike the $C^1$-theory underlying the neural ODE literature, our estimates remain bounded even in the limiting case of weights behaving like Brownian motions, as suggested in [arXiv:2105.12245]. Mathematically, we interpret residual neural network as solutions to (rough) difference equations, and analyse them based on recent results of discrete time signatures and rough path theory.

Related papers

Demystifying Lazy Training of Neural Networks from a Macroscopic Viewpoint [5.9954962391837885]
We study the gradient descent dynamics of neural networks through the lens of macroscopic limits. Our study reveals that gradient descent can rapidly drive deep neural networks to zero training loss. Our approach draws inspiration from the Neural Tangent Kernel (NTK) paradigm.
arXiv Detail & Related papers (2024-04-07T08:07:02Z)
Speed Limits for Deep Learning [67.69149326107103]
Recent advancement in thermodynamics allows bounding the speed at which one can go from the initial weight distribution to the final distribution of the fully trained network. We provide analytical expressions for these speed limits for linear and linearizable neural networks. Remarkably, given some plausible scaling assumptions on the NTK spectra and spectral decomposition of the labels -- learning is optimal in a scaling sense.
arXiv Detail & Related papers (2023-07-27T06:59:46Z)
Addressing caveats of neural persistence with deep graph persistence [54.424983583720675]
We find that the variance of network weights and spatial concentration of large weights are the main factors that impact neural persistence. We propose an extension of the filtration underlying neural persistence to the whole neural network instead of single layers. This yields our deep graph persistence measure, which implicitly incorporates persistent paths through the network and alleviates variance-related issues.
arXiv Detail & Related papers (2023-07-20T13:34:11Z)
Computational Complexity of Learning Neural Networks: Smoothness and Degeneracy [52.40331776572531]
We show that learning depth-$3$ ReLU networks under the Gaussian input distribution is hard even in the smoothed-analysis framework. Our results are under a well-studied assumption on the existence of local pseudorandom generators.
arXiv Detail & Related papers (2023-02-15T02:00:26Z)
Pruning Neural Networks via Coresets and Convex Geometry: Towards No Assumptions [10.635248457021499]
Pruning is one of the predominant approaches for compressing deep neural networks (DNNs) We propose a novel and robust framework for computing such coresets under mild assumptions on the model's weights and inputs. Our method outperforms existing coreset based neural pruning approaches across a wide range of networks and datasets.
arXiv Detail & Related papers (2022-09-18T12:45:26Z)
Approximation results for Gradient Descent trained Shallow Neural Networks in $1d$ [0.0]
Two aspects of neural networks that have been extensively studied are their function approximation properties and their training by gradient descent methods. In most of the current literature these weights are fully or partially hand-crafted but not necessarily their practical performance. This paper balances these two demands and provides an approximation result for neural networks in $1d$ with non-weight optimization by gradient descent.
arXiv Detail & Related papers (2022-09-17T20:26:19Z)
Robust Training and Verification of Implicit Neural Networks: A Non-Euclidean Contractive Approach [64.23331120621118]
This paper proposes a theoretical and computational framework for training and robustness verification of implicit neural networks. We introduce a related embedded network and show that the embedded network can be used to provide an $ell_infty$-norm box over-approximation of the reachable sets of the original network. We apply our algorithms to train implicit neural networks on the MNIST dataset and compare the robustness of our models with the models trained via existing approaches in the literature.
arXiv Detail & Related papers (2022-08-08T03:13:24Z)
Stable Recovery of Entangled Weights: Towards Robust Identification of Deep Neural Networks from Minimal Samples [0.0]
We introduce the so-called entangled weights, which compose weights of successive layers intertwined with suitable diagonal and invertible matrices depending on the activation functions and their shifts. We prove that entangled weights are completely and stably approximated by an efficient and robust algorithm. In terms of practical impact, our study shows that we can relate input-output information uniquely and stably to network parameters, providing a form of explainability.
arXiv Detail & Related papers (2021-01-18T16:31:19Z)
A Revision of Neural Tangent Kernel-based Approaches for Neural Networks [34.75076385561115]
We use the neural tangent kernel to show that networks can fit any finite training sample perfectly. A simple and analytic kernel function was derived as indeed equivalent to a fully-trained network. Our tighter analysis resolves the scaling problem and enables the validation of the original NTK-based results.
arXiv Detail & Related papers (2020-07-02T05:07:55Z)
Revisiting Initialization of Neural Networks [72.24615341588846]
We propose a rigorous estimation of the global curvature of weights across layers by approximating and controlling the norm of their Hessian matrix. Our experiments on Word2Vec and the MNIST/CIFAR image classification tasks confirm that tracking the Hessian norm is a useful diagnostic tool.
arXiv Detail & Related papers (2020-04-20T18:12:56Z)
A Generalized Neural Tangent Kernel Analysis for Two-layer Neural Networks [87.23360438947114]
We show that noisy gradient descent with weight decay can still exhibit a " Kernel-like" behavior. This implies that the training loss converges linearly up to a certain accuracy. We also establish a novel generalization error bound for two-layer neural networks trained by noisy gradient descent with weight decay.
arXiv Detail & Related papers (2020-02-10T18:56:15Z)

This list is automatically generated from the titles and abstracts of the papers in this site.