Intrinsic Dimension, Persistent Homology and Generalization in Neural
Networks
- URL: http://arxiv.org/abs/2111.13171v1
- Date: Thu, 25 Nov 2021 17:06:15 GMT
- Title: Intrinsic Dimension, Persistent Homology and Generalization in Neural
Networks
- Authors: Tolga Birdal, Aaron Lou, Leonidas Guibas, Umut \c{S}im\c{s}ekli
- Abstract summary: We show that generalization error can be equivalently bounded in terms of a notion called the 'persistent homology dimension' (PHD)
We develop an efficient algorithm to estimate PHD in the scale of modern deep neural networks.
Our experiments show that the proposed approach can efficiently compute a network's intrinsic dimension in a variety of settings.
- Score: 19.99615698375829
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Disobeying the classical wisdom of statistical learning theory, modern deep
neural networks generalize well even though they typically contain millions of
parameters. Recently, it has been shown that the trajectories of iterative
optimization algorithms can possess fractal structures, and their
generalization error can be formally linked to the complexity of such fractals.
This complexity is measured by the fractal's intrinsic dimension, a quantity
usually much smaller than the number of parameters in the network. Even though
this perspective provides an explanation for why overparametrized networks
would not overfit, computing the intrinsic dimension (e.g., for monitoring
generalization during training) is a notoriously difficult task, where existing
methods typically fail even in moderate ambient dimensions. In this study, we
consider this problem from the lens of topological data analysis (TDA) and
develop a generic computational tool that is built on rigorous mathematical
foundations. By making a novel connection between learning theory and TDA, we
first illustrate that the generalization error can be equivalently bounded in
terms of a notion called the 'persistent homology dimension' (PHD), where,
compared with prior work, our approach does not require any additional
geometrical or statistical assumptions on the training dynamics. Then, by
utilizing recently established theoretical results and TDA tools, we develop an
efficient algorithm to estimate PHD in the scale of modern deep neural networks
and further provide visualization tools to help understand generalization in
deep learning. Our experiments show that the proposed approach can efficiently
compute a network's intrinsic dimension in a variety of settings, which is
predictive of the generalization error.
Related papers
- Topological Generalization Bounds for Discrete-Time Stochastic Optimization Algorithms [15.473123662393169]
Deep neural networks (DNNs) show remarkable generalization properties.
The source of these capabilities remains elusive, defying the established statistical learning theory.
Recent studies have revealed that properties of training trajectories can be indicative of generalization.
arXiv Detail & Related papers (2024-07-11T17:56:03Z) - Slicing Mutual Information Generalization Bounds for Neural Networks [14.48773730230054]
We introduce new, tighter information-theoretic generalization bounds tailored for deep learning algorithms.
Our bounds offer significant computational and statistical advantages over standard MI bounds.
We extend our analysis to algorithms whose parameters do not need to exactly lie on random subspaces.
arXiv Detail & Related papers (2024-06-06T13:15:37Z) - Generalization Bounds with Data-dependent Fractal Dimensions [5.833272638548154]
We prove fractal geometry-based generalization bounds without requiring any Lipschitz assumption.
Despite introducing a significant amount of technical complications, this new notion lets us control the generalization error.
arXiv Detail & Related papers (2023-02-06T13:24:48Z) - Learning Theory Can (Sometimes) Explain Generalisation in Graph Neural
Networks [13.518582483147325]
We provide a rigorous analysis of the performance of neural networks in the context of transductive inference.
We show that transductive Rademacher complexity can explain the generalisation properties of graph convolutional networks for block models.
arXiv Detail & Related papers (2021-12-07T20:06:23Z) - Path Regularization: A Convexity and Sparsity Inducing Regularization
for Parallel ReLU Networks [75.33431791218302]
We study the training problem of deep neural networks and introduce an analytic approach to unveil hidden convexity in the optimization landscape.
We consider a deep parallel ReLU network architecture, which also includes standard deep networks and ResNets as its special cases.
arXiv Detail & Related papers (2021-10-18T18:00:36Z) - Fractal Structure and Generalization Properties of Stochastic
Optimization Algorithms [71.62575565990502]
We prove that the generalization error of an optimization algorithm can be bounded on the complexity' of the fractal structure that underlies its generalization measure.
We further specialize our results to specific problems (e.g., linear/logistic regression, one hidden/layered neural networks) and algorithms.
arXiv Detail & Related papers (2021-06-09T08:05:36Z) - A neural anisotropic view of underspecification in deep learning [60.119023683371736]
We show that the way neural networks handle the underspecification of problems is highly dependent on the data representation.
Our results highlight that understanding the architectural inductive bias in deep learning is fundamental to address the fairness, robustness, and generalization of these systems.
arXiv Detail & Related papers (2021-04-29T14:31:09Z) - A Convergence Theory Towards Practical Over-parameterized Deep Neural
Networks [56.084798078072396]
We take a step towards closing the gap between theory and practice by significantly improving the known theoretical bounds on both the network width and the convergence time.
We show that convergence to a global minimum is guaranteed for networks with quadratic widths in the sample size and linear in their depth at a time logarithmic in both.
Our analysis and convergence bounds are derived via the construction of a surrogate network with fixed activation patterns that can be transformed at any time to an equivalent ReLU network of a reasonable size.
arXiv Detail & Related papers (2021-01-12T00:40:45Z) - Generalization bound of globally optimal non-convex neural network
training: Transportation map estimation by infinite dimensional Langevin
dynamics [50.83356836818667]
We introduce a new theoretical framework to analyze deep learning optimization with connection to its generalization error.
Existing frameworks such as mean field theory and neural tangent kernel theory for neural network optimization analysis typically require taking limit of infinite width of the network to show its global convergence.
arXiv Detail & Related papers (2020-07-11T18:19:50Z) - Understanding Generalization in Deep Learning via Tensor Methods [53.808840694241]
We advance the understanding of the relations between the network's architecture and its generalizability from the compression perspective.
We propose a series of intuitive, data-dependent and easily-measurable properties that tightly characterize the compressibility and generalizability of neural networks.
arXiv Detail & Related papers (2020-01-14T22:26:57Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.