Fractal Structure and Generalization Properties of Stochastic
Optimization Algorithms
- URL: http://arxiv.org/abs/2106.04881v1
- Date: Wed, 9 Jun 2021 08:05:36 GMT
- Title: Fractal Structure and Generalization Properties of Stochastic
Optimization Algorithms
- Authors: Alexander Camuto, George Deligiannidis, Murat A. Erdogdu, Mert
G\"urb\"uzbalaban, Umut \c{S}im\c{s}ekli, Lingjiong Zhu
- Abstract summary: We prove that the generalization error of an optimization algorithm can be bounded on the complexity' of the fractal structure that underlies its generalization measure.
We further specialize our results to specific problems (e.g., linear/logistic regression, one hidden/layered neural networks) and algorithms.
- Score: 71.62575565990502
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Understanding generalization in deep learning has been one of the major
challenges in statistical learning theory over the last decade. While recent
work has illustrated that the dataset and the training algorithm must be taken
into account in order to obtain meaningful generalization bounds, it is still
theoretically not clear which properties of the data and the algorithm
determine the generalization performance. In this study, we approach this
problem from a dynamical systems theory perspective and represent stochastic
optimization algorithms as random iterated function systems (IFS). Well studied
in the dynamical systems literature, under mild assumptions, such IFSs can be
shown to be ergodic with an invariant measure that is often supported on sets
with a fractal structure. As our main contribution, we prove that the
generalization error of a stochastic optimization algorithm can be bounded
based on the `complexity' of the fractal structure that underlies its invariant
measure. Leveraging results from dynamical systems theory, we show that the
generalization error can be explicitly linked to the choice of the algorithm
(e.g., stochastic gradient descent -- SGD), algorithm hyperparameters (e.g.,
step-size, batch-size), and the geometry of the problem (e.g., Hessian of the
loss). We further specialize our results to specific problems (e.g.,
linear/logistic regression, one hidden-layered neural networks) and algorithms
(e.g., SGD and preconditioned variants), and obtain analytical estimates for
our bound.For modern neural networks, we develop an efficient algorithm to
compute the developed bound and support our theory with various experiments on
neural networks.
Related papers
- Topological Generalization Bounds for Discrete-Time Stochastic Optimization Algorithms [15.473123662393169]
Deep neural networks (DNNs) show remarkable generalization properties.
The source of these capabilities remains elusive, defying the established statistical learning theory.
Recent studies have revealed that properties of training trajectories can be indicative of generalization.
arXiv Detail & Related papers (2024-07-11T17:56:03Z) - Unlock the Power of Algorithm Features: A Generalization Analysis for Algorithm Selection [25.29451529910051]
We propose the first provable guarantee for algorithm selection based on algorithm features.
We analyze the benefits and costs associated with algorithm features and investigate how the generalization error is affected by different factors.
arXiv Detail & Related papers (2024-05-18T17:38:25Z) - Stability and Generalization Analysis of Gradient Methods for Shallow
Neural Networks [59.142826407441106]
We study the generalization behavior of shallow neural networks (SNNs) by leveraging the concept of algorithmic stability.
We consider gradient descent (GD) and gradient descent (SGD) to train SNNs, for both of which we develop consistent excess bounds.
arXiv Detail & Related papers (2022-09-19T18:48:00Z) - On the generalization of learning algorithms that do not converge [54.122745736433856]
Generalization analyses of deep learning typically assume that the training converges to a fixed point.
Recent results indicate that in practice, the weights of deep neural networks optimized with gradient descent often oscillate indefinitely.
arXiv Detail & Related papers (2022-08-16T21:22:34Z) - Amortized Implicit Differentiation for Stochastic Bilevel Optimization [53.12363770169761]
We study a class of algorithms for solving bilevel optimization problems in both deterministic and deterministic settings.
We exploit a warm-start strategy to amortize the estimation of the exact gradient.
By using this framework, our analysis shows these algorithms to match the computational complexity of methods that have access to an unbiased estimate of the gradient.
arXiv Detail & Related papers (2021-11-29T15:10:09Z) - Intrinsic Dimension, Persistent Homology and Generalization in Neural
Networks [19.99615698375829]
We show that generalization error can be equivalently bounded in terms of a notion called the 'persistent homology dimension' (PHD)
We develop an efficient algorithm to estimate PHD in the scale of modern deep neural networks.
Our experiments show that the proposed approach can efficiently compute a network's intrinsic dimension in a variety of settings.
arXiv Detail & Related papers (2021-11-25T17:06:15Z) - Joint Network Topology Inference via Structured Fusion Regularization [70.30364652829164]
Joint network topology inference represents a canonical problem of learning multiple graph Laplacian matrices from heterogeneous graph signals.
We propose a general graph estimator based on a novel structured fusion regularization.
We show that the proposed graph estimator enjoys both high computational efficiency and rigorous theoretical guarantee.
arXiv Detail & Related papers (2021-03-05T04:42:32Z) - A Dynamical View on Optimization Algorithms of Overparameterized Neural
Networks [23.038631072178735]
We consider a broad class of optimization algorithms that are commonly used in practice.
As a consequence, we can leverage the convergence behavior of neural networks.
We believe our approach can also be extended to other optimization algorithms and network theory.
arXiv Detail & Related papers (2020-10-25T17:10:22Z) - Multiplicative noise and heavy tails in stochastic optimization [62.993432503309485]
empirical optimization is central to modern machine learning, but its role in its success is still unclear.
We show that it commonly arises in parameters of discrete multiplicative noise due to variance.
A detailed analysis is conducted in which we describe on key factors, including recent step size, and data, all exhibit similar results on state-of-the-art neural network models.
arXiv Detail & Related papers (2020-06-11T09:58:01Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.