Related papers: On the Stability of the Jacobian Matrix in Deep Neural Networks

On the Stability of the Jacobian Matrix in Deep Neural Networks

URL: http://arxiv.org/abs/2506.08764v1
Date: Tue, 10 Jun 2025 13:04:42 GMT
Title: On the Stability of the Jacobian Matrix in Deep Neural Networks
Authors: Benjamin Dadoun, Soufiane Hayou, Hanan Salam, Mohamed El Amine Seddik, Pierre Youssef,
Abstract summary: We establish a general stability theorem for deep neural networks that accommodates sparsity and weakly correlated weights.<n>Our results rely on recent advances in random matrix theory, and provide rigorous guarantees for spectral stability in a much broader class of network models.
Score: 9.617753464544606
License: http://creativecommons.org/licenses/by/4.0/
Abstract: Deep neural networks are known to suffer from exploding or vanishing gradients as depth increases, a phenomenon closely tied to the spectral behavior of the input-output Jacobian. Prior work has identified critical initialization schemes that ensure Jacobian stability, but these analyses are typically restricted to fully connected networks with i.i.d. weights. In this work, we go significantly beyond these limitations: we establish a general stability theorem for deep neural networks that accommodates sparsity (such as that introduced by pruning) and non-i.i.d., weakly correlated weights (e.g. induced by training). Our results rely on recent advances in random matrix theory, and provide rigorous guarantees for spectral stability in a much broader class of network models. This extends the theoretical foundation for initialization schemes in modern neural networks with structured and dependent randomness.

Related papers

Subhomogeneous Deep Equilibrium Models [5.1334640820688175]
Implicit-depth neural networks have grown as powerful alternatives to traditional networks in various applications. These models often lack guarantees of existence and uniqueness, raising stability, performance, and issues. We present a new analysis of the existence and uniqueness of fixed points for implicit-depth neural networks based on the concept of subhomogeneous operators and the nonlinear Perron-Frobenius theory.
arXiv Detail & Related papers (2024-03-01T18:12:46Z)
Convergence Analysis for Learning Orthonormal Deep Linear Neural Networks [27.29463801531576]
We provide convergence analysis for training orthonormal deep linear neural networks. Our results shed light on how increasing the number of hidden layers can impact the convergence speed.
arXiv Detail & Related papers (2023-11-24T18:46:54Z)
Polynomially Over-Parameterized Convolutional Neural Networks Contain Structured Strong Winning Lottery Tickets [4.020829863982153]
We prove the existence of structured Neuralworks that can approximate any sufficiently smaller network. This result provides the first sub-exponential bound around the Strong Lottery Ticket Hypothesis.
arXiv Detail & Related papers (2023-11-16T12:38:45Z)
The Boundaries of Verifiable Accuracy, Robustness, and Generalisation in Deep Learning [71.14237199051276]
We consider classical distribution-agnostic framework and algorithms minimising empirical risks. We show that there is a large family of tasks for which computing and verifying ideal stable and accurate neural networks is extremely challenging.
arXiv Detail & Related papers (2023-09-13T16:33:27Z)
Stabilizing RNN Gradients through Pre-training [3.335932527835653]
Theory of learning proposes to prevent the gradient from exponential growth with depth or time, to stabilize and improve training. We extend known stability theories to encompass a broader family of deep recurrent networks, requiring minimal assumptions on data and parameter distribution. We propose a new approach to mitigate this issue, that consists on giving a weight of a half to the time and depth contributions to the gradient.
arXiv Detail & Related papers (2023-08-23T11:48:35Z)
Neural Networks with Sparse Activation Induced by Large Bias: Tighter Analysis with Bias-Generalized NTK [86.45209429863858]
We study training one-hidden-layer ReLU networks in the neural tangent kernel (NTK) regime. We show that the neural networks possess a different limiting kernel which we call textitbias-generalized NTK We also study various properties of the neural networks with this new kernel.
arXiv Detail & Related papers (2023-01-01T02:11:39Z)
Stability and Generalization Analysis of Gradient Methods for Shallow Neural Networks [59.142826407441106]
We study the generalization behavior of shallow neural networks (SNNs) by leveraging the concept of algorithmic stability. We consider gradient descent (GD) and gradient descent (SGD) to train SNNs, for both of which we develop consistent excess bounds.
arXiv Detail & Related papers (2022-09-19T18:48:00Z)
Robust Training and Verification of Implicit Neural Networks: A Non-Euclidean Contractive Approach [64.23331120621118]
This paper proposes a theoretical and computational framework for training and robustness verification of implicit neural networks. We introduce a related embedded network and show that the embedded network can be used to provide an $ell_infty$-norm box over-approximation of the reachable sets of the original network. We apply our algorithms to train implicit neural networks on the MNIST dataset and compare the robustness of our models with the models trained via existing approaches in the literature.
arXiv Detail & Related papers (2022-08-08T03:13:24Z)
On the Intrinsic Structures of Spiking Neural Networks [66.57589494713515]
Recent years have emerged a surge of interest in SNNs owing to their remarkable potential to handle time-dependent and event-driven data. There has been a dearth of comprehensive studies examining the impact of intrinsic structures within spiking computations. This work delves deep into the intrinsic structures of SNNs, by elucidating their influence on the expressivity of SNNs.
arXiv Detail & Related papers (2022-06-21T09:42:30Z)
Critical Initialization of Wide and Deep Neural Networks through Partial Jacobians: General Theory and Applications [6.579523168465526]
We introduce emphpartial Jacobians of a network, defined as derivatives of preactivations in layer $l$ with respect to preactivations in layer $l_0leq l$. We derive recurrence relations for the norms of partial Jacobians and utilize these relations to analyze criticality of deep fully connected neural networks with LayerNorm and/or residual connections.
arXiv Detail & Related papers (2021-11-23T20:31:42Z)
Formalizing Generalization and Robustness of Neural Networks to Weight Perturbations [58.731070632586594]
We provide the first formal analysis for feed-forward neural networks with non-negative monotone activation functions against weight perturbations. We also design a new theory-driven loss function for training generalizable and robust neural networks against weight perturbations.
arXiv Detail & Related papers (2021-03-03T06:17:03Z)
Revisiting Initialization of Neural Networks [72.24615341588846]
We propose a rigorous estimation of the global curvature of weights across layers by approximating and controlling the norm of their Hessian matrix. Our experiments on Word2Vec and the MNIST/CIFAR image classification tasks confirm that tracking the Hessian norm is a useful diagnostic tool.
arXiv Detail & Related papers (2020-04-20T18:12:56Z)

This list is automatically generated from the titles and abstracts of the papers in this site.