On the Generalization Behavior of Deep Residual Networks From a Dynamical System Perspective
- URL: http://arxiv.org/abs/2602.20921v1
- Date: Tue, 24 Feb 2026 13:59:06 GMT
- Title: On the Generalization Behavior of Deep Residual Networks From a Dynamical System Perspective
- Authors: Jinshu Huang, Mingfei Sun, Chunlin Wu,
- Abstract summary: Deep neural networks (DNNs) have significantly advanced machine learning, with model depth playing a central role in their successes.<n>In this work, we establish generalization error bounds for both discrete- and continuous-time residual networks (ResNets) by combining Rademacher complexity, flow maps of dynamical systems, and the convergence behavior of ResNets in the deep-layer limit.<n>Findings provide a unified understanding of generalization across both discrete- and continuous-time ResNets, helping to close the gap in both the order of sample complexity and assumptions between the discrete- and continuous-time settings.
- Score: 1.0388986221727612
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Deep neural networks (DNNs) have significantly advanced machine learning, with model depth playing a central role in their successes. The dynamical system modeling approach has recently emerged as a powerful framework, offering new mathematical insights into the structure and learning behavior of DNNs. In this work, we establish generalization error bounds for both discrete- and continuous-time residual networks (ResNets) by combining Rademacher complexity, flow maps of dynamical systems, and the convergence behavior of ResNets in the deep-layer limit. The resulting bounds are of order $O(1/\sqrt{S})$ with respect to the number of training samples $S$, and include a structure-dependent negative term, yielding depth-uniform and asymptotic generalization bounds under milder assumptions. These findings provide a unified understanding of generalization across both discrete- and continuous-time ResNets, helping to close the gap in both the order of sample complexity and assumptions between the discrete- and continuous-time settings.
Related papers
- Dynamical Learning in Deep Asymmetric Recurrent Neural Networks [1.3421746809394772]
We show that asymmetric deep recurrent neural networks give rise to an exponentially large, dense accessible manifold of internal representations.<n>We propose a distributed learning scheme in which input-output associations emerge naturally from the recurrent dynamics.
arXiv Detail & Related papers (2025-09-05T12:05:09Z) - Lattice-Based Pruning in Recurrent Neural Networks via Poset Modeling [0.0]
Recurrent neural networks (RNNs) are central to sequence modeling tasks, yet their high computational complexity poses challenges for scalability and real-time deployment.<n>We introduce a novel framework that models RNNs as partially ordered sets (posets) and constructs corresponding dependency lattices.<n>By identifying meet irreducible neurons, our lattice-based pruning algorithm selectively retains critical connections while eliminating redundant ones.
arXiv Detail & Related papers (2025-02-23T10:11:38Z) - Recurrent Stochastic Configuration Networks with Hybrid Regularization for Nonlinear Dynamics Modelling [3.8719670789415925]
Recurrent configuration networks (RSCNs) have shown great potential in modelling nonlinear dynamic systems with uncertainties.<n>This paper presents an RSCN with hybrid regularization to enhance both the learning capacity and generalization performance of the network.
arXiv Detail & Related papers (2024-11-26T03:06:39Z) - Generalization of Scaled Deep ResNets in the Mean-Field Regime [55.77054255101667]
We investigate emphscaled ResNet in the limit of infinitely deep and wide neural networks.
Our results offer new insights into the generalization ability of deep ResNet beyond the lazy training regime.
arXiv Detail & Related papers (2024-03-14T21:48:00Z) - How neural networks learn to classify chaotic time series [77.34726150561087]
We study the inner workings of neural networks trained to classify regular-versus-chaotic time series.
We find that the relation between input periodicity and activation periodicity is key for the performance of LKCNN models.
arXiv Detail & Related papers (2023-06-04T08:53:27Z) - Generalization and Estimation Error Bounds for Model-based Neural
Networks [78.88759757988761]
We show that the generalization abilities of model-based networks for sparse recovery outperform those of regular ReLU networks.
We derive practical design rules that allow to construct model-based networks with guaranteed high generalization.
arXiv Detail & Related papers (2023-04-19T16:39:44Z) - Deep Architecture Connectivity Matters for Its Convergence: A
Fine-Grained Analysis [94.64007376939735]
We theoretically characterize the impact of connectivity patterns on the convergence of deep neural networks (DNNs) under gradient descent training.
We show that by a simple filtration on "unpromising" connectivity patterns, we can trim down the number of models to evaluate.
arXiv Detail & Related papers (2022-05-11T17:43:54Z) - Estimating Linear Dynamical Networks of Cyclostationary Processes [0.0]
We present a novel algorithm for guaranteed topology learning in networks excited by cyclostationary processes.
Unlike prior work, the framework applies to linear dynamic system with complex valued dependencies.
In the second part of the article, we analyze conditions for consistent topology learning for bidirected radial networks when a subset of the network is unobserved.
arXiv Detail & Related papers (2020-09-26T18:54:50Z) - Continuous-in-Depth Neural Networks [107.47887213490134]
We first show that ResNets fail to be meaningful dynamical in this richer sense.
We then demonstrate that neural network models can learn to represent continuous dynamical systems.
We introduce ContinuousNet as a continuous-in-depth generalization of ResNet architectures.
arXiv Detail & Related papers (2020-08-05T22:54:09Z) - Provably Efficient Neural Estimation of Structural Equation Model: An
Adversarial Approach [144.21892195917758]
We study estimation in a class of generalized Structural equation models (SEMs)
We formulate the linear operator equation as a min-max game, where both players are parameterized by neural networks (NNs), and learn the parameters of these neural networks using a gradient descent.
For the first time we provide a tractable estimation procedure for SEMs based on NNs with provable convergence and without the need for sample splitting.
arXiv Detail & Related papers (2020-07-02T17:55:47Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.