Deterministic equivalent and error universality of deep random features
learning
- URL: http://arxiv.org/abs/2302.00401v1
- Date: Wed, 1 Feb 2023 12:37:10 GMT
- Title: Deterministic equivalent and error universality of deep random features
learning
- Authors: Dominik Schr\"oder, Hugo Cui, Daniil Dmitriev, Bruno Loureiro
- Abstract summary: This problem can be seen as a natural generalization of the widely studied random features model to deeper architectures.
First, we prove universality of the test error in a universality ridge setting where the learner and target networks share the same intermediate layers, and provide a sharp formula for it.
Second, we conjecture the universality of the test error in the more general setting of arbitrary convex losses and generic learner/target architectures.
- Score: 4.8461049669050915
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: This manuscript considers the problem of learning a random Gaussian network
function using a fully connected network with frozen intermediate layers and
trainable readout layer. This problem can be seen as a natural generalization
of the widely studied random features model to deeper architectures. First, we
prove Gaussian universality of the test error in a ridge regression setting
where the learner and target networks share the same intermediate layers, and
provide a sharp asymptotic formula for it. Establishing this result requires
proving a deterministic equivalent for traces of the deep random features
sample covariance matrices which can be of independent interest. Second, we
conjecture the asymptotic Gaussian universality of the test error in the more
general setting of arbitrary convex losses and generic learner/target
architectures. We provide extensive numerical evidence for this conjecture,
which requires the derivation of closed-form expressions for the layer-wise
post-activation population covariances. In light of our results, we investigate
the interplay between architecture design and implicit regularization.
Related papers
- Scaling and renormalization in high-dimensional regression [72.59731158970894]
This paper presents a succinct derivation of the training and generalization performance of a variety of high-dimensional ridge regression models.
We provide an introduction and review of recent results on these topics, aimed at readers with backgrounds in physics and deep learning.
arXiv Detail & Related papers (2024-05-01T15:59:00Z) - Learning a Gaussian Mixture for Sparsity Regularization in Inverse
Problems [2.375943263571389]
In inverse problems, the incorporation of a sparsity prior yields a regularization effect on the solution.
We propose a probabilistic sparsity prior formulated as a mixture of Gaussians, capable of modeling sparsity with respect to a generic basis.
We put forth both a supervised and an unsupervised training strategy to estimate the parameters of this network.
arXiv Detail & Related papers (2024-01-29T22:52:57Z) - Structured Radial Basis Function Network: Modelling Diversity for
Multiple Hypotheses Prediction [51.82628081279621]
Multi-modal regression is important in forecasting nonstationary processes or with a complex mixture of distributions.
A Structured Radial Basis Function Network is presented as an ensemble of multiple hypotheses predictors for regression problems.
It is proved that this structured model can efficiently interpolate this tessellation and approximate the multiple hypotheses target distribution.
arXiv Detail & Related papers (2023-09-02T01:27:53Z) - Learning Linear Causal Representations from Interventions under General
Nonlinear Mixing [52.66151568785088]
We prove strong identifiability results given unknown single-node interventions without access to the intervention targets.
This is the first instance of causal identifiability from non-paired interventions for deep neural network embeddings.
arXiv Detail & Related papers (2023-06-04T02:32:12Z) - Learning and generalization of one-hidden-layer neural networks, going
beyond standard Gaussian data [14.379261299138147]
This paper analyzes the convergence and iterations of a one-hidden-layer neural network when the input features follow the Gaussian mixture model.
For the first time, this paper characterizes the impact of the input distributions on the sample and the learning rate.
arXiv Detail & Related papers (2022-07-07T23:27:44Z) - The Separation Capacity of Random Neural Networks [78.25060223808936]
We show that a sufficiently large two-layer ReLU-network with standard Gaussian weights and uniformly distributed biases can solve this problem with high probability.
We quantify the relevant structure of the data in terms of a novel notion of mutual complexity.
arXiv Detail & Related papers (2021-07-31T10:25:26Z) - Predicting Unreliable Predictions by Shattering a Neural Network [145.3823991041987]
Piecewise linear neural networks can be split into subfunctions.
Subfunctions have their own activation pattern, domain, and empirical error.
Empirical error for the full network can be written as an expectation over subfunctions.
arXiv Detail & Related papers (2021-06-15T18:34:41Z) - Good Classifiers are Abundant in the Interpolating Regime [64.72044662855612]
We develop a methodology to compute precisely the full distribution of test errors among interpolating classifiers.
We find that test errors tend to concentrate around a small typical value $varepsilon*$, which deviates substantially from the test error of worst-case interpolating model.
Our results show that the usual style of analysis in statistical learning theory may not be fine-grained enough to capture the good generalization performance observed in practice.
arXiv Detail & Related papers (2020-06-22T21:12:31Z) - Non-Euclidean Universal Approximation [4.18804572788063]
Modifications to a neural network's input and output layers are often required to accommodate the specificities of most practical learning tasks.
We present general conditions describing feature and readout maps that preserve an architecture's ability to approximate any continuous functions uniformly on compacts.
arXiv Detail & Related papers (2020-06-03T15:38:57Z) - Generalisation error in learning with random features and the hidden
manifold model [23.71637173968353]
We study generalised linear regression and classification for a synthetically generated dataset.
We consider the high-dimensional regime and using the replica method from statistical physics.
We show how to obtain the so-called double descent behaviour for logistic regression with a peak at the threshold.
We discuss the role played by correlations in the data generated by the hidden manifold model.
arXiv Detail & Related papers (2020-02-21T14:49:41Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.