Related papers: Generalization Through Growth: Hidden Dynamics Controls Depth Dependence

Generalization Through Growth: Hidden Dynamics Controls Depth Dependence

URL: http://arxiv.org/abs/2505.15064v1
Date: Wed, 21 May 2025 03:32:30 GMT
Title: Generalization Through Growth: Hidden Dynamics Controls Depth Dependence
Authors: Sho Sonoda, Yuka Hashimoto, Isao Ishikawa, Masahiro Ikeda,
Abstract summary: We present a unified framework for arbitrary bluepseudo-metric spaces in which a depth-(k) network is the composition of continuous hidden maps (f:mathcalXto mathcalX) and an output map (h:mathcalXto mathbbR).
Score: 15.67299102925013
License: http://creativecommons.org/licenses/by/4.0/
Abstract: Recent theory has reduced the depth dependence of generalization bounds from exponential to polynomial and even depth-independent rates, yet these results remain tied to specific architectures and Euclidean inputs. We present a unified framework for arbitrary \blue{pseudo-metric} spaces in which a depth-$k$ network is the composition of continuous hidden maps $f:\mathcal{X}\to \mathcal{X}$ and an output map $h:\mathcal{X}\to \mathbb{R}$. The resulting bound $O(\sqrt{(\alpha + \log \beta(k))/n})$ isolates the sole depth contribution in $\beta(k)$, the word-ball growth of the semigroup generated by the hidden layers. By Gromov's theorem polynomial (resp. exponential) growth corresponds to virtually nilpotent (resp. expanding) dynamics, revealing a geometric dichotomy behind existing $O(\sqrt{k})$ (sublinear depth) and $\tilde{O}(1)$ (depth-independent) rates. We further provide covering-number estimates showing that expanding dynamics yield an exponential parameter saving via compositional expressivity. Our results decouple specification from implementation, offering architecture-agnostic and dynamical-systems-aware guarantees applicable to modern deep-learning paradigms such as test-time inference and diffusion models.

Related papers

Learning quadratic neural networks in high dimensions: SGD dynamics and scaling laws [21.18373933718468]
We study the optimization and sample complexity of gradient-based training of a two-layer neural network with quadratic activation function in the high-dimensional regime.<n>We present a sharp analysis of the dynamics in the feature learning regime, for both the population limit and the finite-sample discretization.
arXiv Detail & Related papers (2025-08-05T17:57:56Z)
The Generative Leap: Sharp Sample Complexity for Efficiently Learning Gaussian Multi-Index Models [71.5283441529015]
In this work we consider generic Gaussian Multi-index models, in which the labels only depend on the (Gaussian) $d$-dimensional inputs through their projection onto a low-dimensional $r = O_d(1)$ subspace.<n>We introduce the generative leap exponent $kstar$, a natural extension of the generative exponent from [Damian et al.'24] to the multi-index setting.
arXiv Detail & Related papers (2025-06-05T18:34:56Z)
Quantum Circuit Encodings of Polynomial Chaos Expansions [5.63729124086755]
We study countably-parametric holomorphic maps $u:Uto mathbbR$, where the parameter domain is $U=[-1,1]mathbbN$.<n>We establish dimension-independent quantum circuit approximation rates via the best $n$-term truncations of generalized chaos (gPC) expansions of these parametric maps.<n>Our results have implications for quantum-enhanced algorithms for a wide range of maps in applications.
arXiv Detail & Related papers (2025-06-02T15:53:36Z)
How well behaved is finite dimensional Diffusion Maps? [0.0]
We derive a series of properties that remain valid after finite-dimensional and almost isometric Diffusion Maps (DM)<n>We quantify the error between the estimated tangent spaces and the true tangent spaces over the submanifolds after the DM embedding.<n>These results offer a solid theoretical foundation for understanding the performance and reliability of DM in practical applications.
arXiv Detail & Related papers (2024-12-05T09:12:25Z)
Constructive Universal Approximation and Finite Sample Memorization by Narrow Deep ReLU Networks [0.0]
We show that any dataset with $N$ distinct points in $mathbbRd$ and $M$ output classes can be exactly classified.<n>We also prove a universal approximation theorem in $Lp(Omega; mathbbRm)$ for any bounded domain.<n>Our results offer a unified and interpretable framework connecting controllability, expressivity, and training dynamics in deep neural networks.
arXiv Detail & Related papers (2024-09-10T14:31:21Z)
Learning with Norm Constrained, Over-parameterized, Two-layer Neural Networks [54.177130905659155]
Recent studies show that a reproducing kernel Hilbert space (RKHS) is not a suitable space to model functions by neural networks. In this paper, we study a suitable function space for over- parameterized two-layer neural networks with bounded norms.
arXiv Detail & Related papers (2024-04-29T15:04:07Z)
Exponential Separations in Symmetric Neural Networks [48.80300074254758]
We consider symmetric Networkparencitesantoro 2017simple architecture as a natural generalization of DeepSetsparencitezaheerdeep architecture. Under the restriction to analytic activation functions, we construct a symmetric function acting on sets of dimensions $N$ in dimension with $D$.
arXiv Detail & Related papers (2022-06-02T19:45:10Z)
Geometric Graph Representation Learning via Maximizing Rate Reduction [73.6044873825311]
Learning node representations benefits various downstream tasks in graph analysis such as community detection and node classification. We propose Geometric Graph Representation Learning (G2R) to learn node representations in an unsupervised manner. G2R maps nodes in distinct groups into different subspaces, while each subspace is compact and different subspaces are dispersed.
arXiv Detail & Related papers (2022-02-13T07:46:24Z)
Exponential Family Model-Based Reinforcement Learning via Score Matching [97.31477125728844]
We propose an optimistic model-based algorithm, dubbed SMRL, for finitehorizon episodic reinforcement learning (RL) SMRL uses score matching, an unnormalized density estimation technique that enables efficient estimation of the model parameter by ridge regression.
arXiv Detail & Related papers (2021-12-28T15:51:07Z)
Exponential Convergence of Deep Operator Networks for Elliptic Partial Differential Equations [0.0]
We construct deep operator networks (ONets) between infinite-dimensional spaces that emulate with an exponential rate of convergence the coefficient-to-solution map of elliptic second-order PDEs. In particular, we consider problems set in $d$-dimensional periodic domains, $d=1, 2, dots$, and with analytic right-hand sides and coefficients. We prove that the neural networks in the ONet have size $mathcalO(left|log(varepsilon)right|kappa)$ for some $kappa
arXiv Detail & Related papers (2021-12-15T13:56:28Z)
On Function Approximation in Reinforcement Learning: Optimism in the Face of Large State Spaces [208.67848059021915]
We study the exploration-exploitation tradeoff at the core of reinforcement learning. In particular, we prove that the complexity of the function class $mathcalF$ characterizes the complexity of the function. Our regret bounds are independent of the number of episodes.
arXiv Detail & Related papers (2020-11-09T18:32:22Z)
Better Depth-Width Trade-offs for Neural Networks through the lens of Dynamical Systems [24.229336600210015]
Recently, depth separation results for ReLU networks were obtained via a new connection with dynamical systems. We improve the existing width lower bounds along several aspects. A byproduct of our results is that there exists a universal constant characterizing the depth-width trade-offs.
arXiv Detail & Related papers (2020-03-02T11:36:26Z)

This list is automatically generated from the titles and abstracts of the papers in this site.