Why neural networks find simple solutions: the many regularizers of
geometric complexity
- URL: http://arxiv.org/abs/2209.13083v1
- Date: Tue, 27 Sep 2022 00:16:38 GMT
- Title: Why neural networks find simple solutions: the many regularizers of
geometric complexity
- Authors: Benoit Dherin, Michael Munn, Mihaela C. Rosca, and David G.T. Barrett
- Abstract summary: We develop the notion of geometric complexity, which is a measure of the variability of the model function, computed using a discrete Dirichlet energy.
We show that many common trainings such as parameter norm regularization, spectral norm regularization, flatness regularization, gradient regularization, noise regularization, all act to control geometric complexity.
- Score: 13.729491571993163
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: In many contexts, simpler models are preferable to more complex models and
the control of this model complexity is the goal for many methods in machine
learning such as regularization, hyperparameter tuning and architecture design.
In deep learning, it has been difficult to understand the underlying mechanisms
of complexity control, since many traditional measures are not naturally
suitable for deep neural networks. Here we develop the notion of geometric
complexity, which is a measure of the variability of the model function,
computed using a discrete Dirichlet energy. Using a combination of theoretical
arguments and empirical results, we show that many common training heuristics
such as parameter norm regularization, spectral norm regularization, flatness
regularization, implicit gradient regularization, noise regularization and the
choice of parameter initialization all act to control geometric complexity,
providing a unifying framework in which to characterize the behavior of deep
learning models.
Related papers
- Tunable Complexity Benchmarks for Evaluating Physics-Informed Neural
Networks on Coupled Ordinary Differential Equations [64.78260098263489]
In this work, we assess the ability of physics-informed neural networks (PINNs) to solve increasingly-complex coupled ordinary differential equations (ODEs)
We show that PINNs eventually fail to produce correct solutions to these benchmarks as their complexity increases.
We identify several reasons why this may be the case, including insufficient network capacity, poor conditioning of the ODEs, and high local curvature, as measured by the Laplacian of the PINN loss.
arXiv Detail & Related papers (2022-10-14T15:01:32Z) - Gaussian Process Surrogate Models for Neural Networks [6.8304779077042515]
In science and engineering, modeling is a methodology used to understand complex systems whose internal processes are opaque.
We construct a class of surrogate models for neural networks using Gaussian processes.
We demonstrate our approach captures existing phenomena related to the spectral bias of neural networks, and then show that our surrogate models can be used to solve practical problems.
arXiv Detail & Related papers (2022-08-11T20:17:02Z) - Learning Non-Vacuous Generalization Bounds from Optimization [8.294831479902658]
We present a simple yet non-vacuous generalization bound from the optimization perspective.
We achieve this goal by leveraging that the hypothesis set accessed by gradient algorithms is essentially fractal-like.
Numerical studies demonstrate that our approach is able to yield plausible generalization guarantees for modern neural networks.
arXiv Detail & Related papers (2022-06-09T08:59:46Z) - Intrinsic Dimension, Persistent Homology and Generalization in Neural
Networks [19.99615698375829]
We show that generalization error can be equivalently bounded in terms of a notion called the 'persistent homology dimension' (PHD)
We develop an efficient algorithm to estimate PHD in the scale of modern deep neural networks.
Our experiments show that the proposed approach can efficiently compute a network's intrinsic dimension in a variety of settings.
arXiv Detail & Related papers (2021-11-25T17:06:15Z) - Redefining Neural Architecture Search of Heterogeneous Multi-Network
Models by Characterizing Variation Operators and Model Components [71.03032589756434]
We investigate the effect of different variation operators in a complex domain, that of multi-network heterogeneous neural models.
We characterize both the variation operators, according to their effect on the complexity and performance of the model; and the models, relying on diverse metrics which estimate the quality of the different parts composing it.
arXiv Detail & Related papers (2021-06-16T17:12:26Z) - Learning normal form autoencoders for data-driven discovery of
universal,parameter-dependent governing equations [3.769860395223177]
Complex systems manifest a small number of instabilities and bifurcations that are canonical in nature.
Such parametric instabilities are mathematically characterized by their universal un-foldings, or normal form dynamics.
We introduce deep learning autoencoders to discover coordinate transformations that capture the underlying parametric dependence.
arXiv Detail & Related papers (2021-06-09T14:25:18Z) - Fractal Structure and Generalization Properties of Stochastic
Optimization Algorithms [71.62575565990502]
We prove that the generalization error of an optimization algorithm can be bounded on the complexity' of the fractal structure that underlies its generalization measure.
We further specialize our results to specific problems (e.g., linear/logistic regression, one hidden/layered neural networks) and algorithms.
arXiv Detail & Related papers (2021-06-09T08:05:36Z) - Joint Network Topology Inference via Structured Fusion Regularization [70.30364652829164]
Joint network topology inference represents a canonical problem of learning multiple graph Laplacian matrices from heterogeneous graph signals.
We propose a general graph estimator based on a novel structured fusion regularization.
We show that the proposed graph estimator enjoys both high computational efficiency and rigorous theoretical guarantee.
arXiv Detail & Related papers (2021-03-05T04:42:32Z) - Neural Complexity Measures [96.06344259626127]
We propose Neural Complexity (NC), a meta-learning framework for predicting generalization.
Our model learns a scalar complexity measure through interactions with many heterogeneous tasks in a data-driven way.
arXiv Detail & Related papers (2020-08-07T02:12:10Z) - Multipole Graph Neural Operator for Parametric Partial Differential
Equations [57.90284928158383]
One of the main challenges in using deep learning-based methods for simulating physical systems is formulating physics-based data.
We propose a novel multi-level graph neural network framework that captures interaction at all ranges with only linear complexity.
Experiments confirm our multi-graph network learns discretization-invariant solution operators to PDEs and can be evaluated in linear time.
arXiv Detail & Related papers (2020-06-16T21:56:22Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.