Gradient Span Algorithms Make Predictable Progress in High Dimension
- URL: http://arxiv.org/abs/2410.09973v1
- Date: Sun, 13 Oct 2024 19:26:18 GMT
- Title: Gradient Span Algorithms Make Predictable Progress in High Dimension
- Authors: Felix Benning, Leif Döring,
- Abstract summary: We prove that all 'gradient algorithms' have deterministically on scaled random functions as the this tends to infinity.
The distributional assumption is used for training but also encompass random glasses and spin.
- Score: 0.0
- License: http://creativecommons.org/licenses/by-nc-sa/4.0/
- Abstract: We prove that all 'gradient span algorithms' have asymptotically deterministic behavior on scaled Gaussian random functions as the dimension tends to infinity. In particular, this result explains the counterintuitive phenomenon that different training runs of many large machine learning models result in approximately equal cost curves despite random initialization on a complicated non-convex landscape. The distributional assumption of (non-stationary) isotropic Gaussian random functions we use is sufficiently general to serve as realistic model for machine learning training but also encompass spin glasses and random quadratic functions.
Related papers
- Scaling and renormalization in high-dimensional regression [72.59731158970894]
This paper presents a succinct derivation of the training and generalization performance of a variety of high-dimensional ridge regression models.
We provide an introduction and review of recent results on these topics, aimed at readers with backgrounds in physics and deep learning.
arXiv Detail & Related papers (2024-05-01T15:59:00Z) - Universal approximation property of Banach space-valued random feature models including random neural networks [3.3379026542599934]
We introduce a Banach space-valued extension of random feature learning.
By randomly initializing the feature maps, only the linear readout needs to be trained.
We derive approximation rates and an explicit algorithm to learn an element of the given Banach space.
arXiv Detail & Related papers (2023-12-13T11:27:15Z) - A Heavy-Tailed Algebra for Probabilistic Programming [53.32246823168763]
We propose a systematic approach for analyzing the tails of random variables.
We show how this approach can be used during the static analysis (before drawing samples) pass of a probabilistic programming language compiler.
Our empirical results confirm that inference algorithms that leverage our heavy-tailed algebra attain superior performance across a number of density modeling and variational inference tasks.
arXiv Detail & Related papers (2023-06-15T16:37:36Z) - Promises and Pitfalls of the Linearized Laplace in Bayesian Optimization [73.80101701431103]
The linearized-Laplace approximation (LLA) has been shown to be effective and efficient in constructing Bayesian neural networks.
We study the usefulness of the LLA in Bayesian optimization and highlight its strong performance and flexibility.
arXiv Detail & Related papers (2023-04-17T14:23:43Z) - Simplex Random Features [53.97976744884616]
We present Simplex Random Features (SimRFs), a new random feature (RF) mechanism for unbiased approximation of the softmax and Gaussian kernels.
We prove that SimRFs provide the smallest possible mean square error (MSE) on unbiased estimates of these kernels.
We show consistent gains provided by SimRFs in settings including pointwise kernel estimation, nonparametric classification and scalable Transformers.
arXiv Detail & Related papers (2023-01-31T18:53:39Z) - Model, sample, and epoch-wise descents: exact solution of gradient flow
in the random feature model [16.067228939231047]
We analyze the whole temporal behavior of the generalization and training errors under gradient flow.
We show that in the limit of large system size the full time-evolution path of both errors can be calculated analytically.
Our techniques are based on Cauchy complex integral representations of the errors together with recent random matrix methods based on linear pencils.
arXiv Detail & Related papers (2021-10-22T14:25:54Z) - Shallow Representation is Deep: Learning Uncertainty-aware and
Worst-case Random Feature Dynamics [1.1470070927586016]
This paper views uncertain system models as unknown or uncertain smooth functions in universal kernel Hilbert spaces.
By directly approximating the one-step dynamics function using random features with uncertain parameters, we then view the whole dynamical system as a multi-layer neural network.
arXiv Detail & Related papers (2021-06-24T14:48:12Z) - Function Approximation via Sparse Random Features [23.325877475827337]
This paper introduces the sparse random feature method that learns parsimonious random feature models utilizing techniques from compressive sensing.
We show that the sparse random feature method outperforms shallow networks for well-structured functions and applications to scientific machine learning tasks.
arXiv Detail & Related papers (2021-03-04T17:53:54Z) - Multiplicative noise and heavy tails in stochastic optimization [62.993432503309485]
empirical optimization is central to modern machine learning, but its role in its success is still unclear.
We show that it commonly arises in parameters of discrete multiplicative noise due to variance.
A detailed analysis is conducted in which we describe on key factors, including recent step size, and data, all exhibit similar results on state-of-the-art neural network models.
arXiv Detail & Related papers (2020-06-11T09:58:01Z) - Efficiently Sampling Functions from Gaussian Process Posteriors [76.94808614373609]
We propose an easy-to-use and general-purpose approach for fast posterior sampling.
We demonstrate how decoupled sample paths accurately represent Gaussian process posteriors at a fraction of the usual cost.
arXiv Detail & Related papers (2020-02-21T14:03:16Z) - Randomly Projected Additive Gaussian Processes for Regression [37.367935314532154]
We use additive sums of kernels for GP regression, where each kernel operates on a different random projection of its inputs.
We prove this convergence and its rate, and propose a deterministic approach that converges more quickly than purely random projections.
arXiv Detail & Related papers (2019-12-30T07:26:18Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.