Random features and polynomial rules
- URL: http://arxiv.org/abs/2402.10164v1
- Date: Thu, 15 Feb 2024 18:09:41 GMT
- Title: Random features and polynomial rules
- Authors: Fabi\'an Aguirre-L\'opez, Silvio Franz, Mauro Pastore
- Abstract summary: We present a generalization of the performance of random features models for generic supervised learning problems with Gaussian data.
We find good agreement far from the limits where $Dto infty$ and at least one between $P/DK$, $N/DL$ remains finite.
- Score: 0.0
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Random features models play a distinguished role in the theory of deep
learning, describing the behavior of neural networks close to their
infinite-width limit. In this work, we present a thorough analysis of the
generalization performance of random features models for generic supervised
learning problems with Gaussian data. Our approach, built with tools from the
statistical mechanics of disordered systems, maps the random features model to
an equivalent polynomial model, and allows us to plot average generalization
curves as functions of the two main control parameters of the problem: the
number of random features $N$ and the size $P$ of the training set, both
assumed to scale as powers in the input dimension $D$. Our results extend the
case of proportional scaling between $N$, $P$ and $D$. They are in accordance
with rigorous bounds known for certain particular learning tasks and are in
quantitative agreement with numerical experiments performed over many order of
magnitudes of $N$ and $P$. We find good agreement also far from the asymptotic
limits where $D\to \infty$ and at least one between $P/D^K$, $N/D^L$ remains
finite.
Related papers
- Learning sum of diverse features: computational hardness and efficient gradient-based training for ridge combinations [40.77319247558742]
We study the computational complexity of learning a target function $f_*:mathbbRdtomathbbR$ with additive structure.
We prove that a large subset of $f_*$ can be efficiently learned by gradient training of a two-layer neural network.
arXiv Detail & Related papers (2024-06-17T17:59:17Z) - Scaling and renormalization in high-dimensional regression [72.59731158970894]
This paper presents a succinct derivation of the training and generalization performance of a variety of high-dimensional ridge regression models.
We provide an introduction and review of recent results on these topics, aimed at readers with backgrounds in physics and deep learning.
arXiv Detail & Related papers (2024-05-01T15:59:00Z) - Computational-Statistical Gaps in Gaussian Single-Index Models [77.1473134227844]
Single-Index Models are high-dimensional regression problems with planted structure.
We show that computationally efficient algorithms, both within the Statistical Query (SQ) and the Low-Degree Polynomial (LDP) framework, necessarily require $Omega(dkstar/2)$ samples.
arXiv Detail & Related papers (2024-03-08T18:50:19Z) - Agnostically Learning Multi-index Models with Queries [54.290489524576756]
We study the power of query access for the task of agnostic learning under the Gaussian distribution.
We show that query access gives significant runtime improvements over random examples for agnostically learning MIMs.
arXiv Detail & Related papers (2023-12-27T15:50:47Z) - Sparse random matrices and Gaussian ensembles with varying randomness [0.0]
We study a system of $N$ qubits with a random Hamiltonian obtained by drawing coupling constants from Gaussian distributions in various ways.
We find some evidence that the behaviour changes in an abrupt manner when the number of non-zero independent terms in the Hamiltonian is exponentially large in $N$.
arXiv Detail & Related papers (2023-05-12T14:19:27Z) - On the Identifiability and Estimation of Causal Location-Scale Noise
Models [122.65417012597754]
We study the class of location-scale or heteroscedastic noise models (LSNMs)
We show the causal direction is identifiable up to some pathological cases.
We propose two estimators for LSNMs: an estimator based on (non-linear) feature maps, and one based on neural networks.
arXiv Detail & Related papers (2022-10-13T17:18:59Z) - Correlation Functions in Random Fully Connected Neural Networks at
Finite Width [17.51364577113718]
This article considers fully connected neural networks with Gaussian random weights and biases and $L$ hidden layers.
For bounded non-linearities we give sharp recursion estimates in powers of $1/n$ for the joint correlation functions of the network output and its derivatives.
We find in both cases that the depth-to-width ratio $L/n$ plays the role of an effective network depth, controlling both the scale of fluctuations at individual neurons and the size of inter-neuron correlations.
arXiv Detail & Related papers (2022-04-03T11:57:18Z) - Towards an Understanding of Benign Overfitting in Neural Networks [104.2956323934544]
Modern machine learning models often employ a huge number of parameters and are typically optimized to have zero training loss.
We examine how these benign overfitting phenomena occur in a two-layer neural network setting.
We show that it is possible for the two-layer ReLU network interpolator to achieve a near minimax-optimal learning rate.
arXiv Detail & Related papers (2021-06-06T19:08:53Z) - Fundamental tradeoffs between memorization and robustness in random
features and neural tangent regimes [15.76663241036412]
We prove for a large class of activation functions that, if the model memorizes even a fraction of the training, then its Sobolev-seminorm is lower-bounded.
Experiments reveal for the first time, (iv) a multiple-descent phenomenon in the robustness of the min-norm interpolator.
arXiv Detail & Related papers (2021-06-04T17:52:50Z) - On the Generalization Power of Overfitted Two-Layer Neural Tangent
Kernel Models [42.72822331030195]
min $ell$-norm overfitting solutions for the neural tangent kernel (NTK) model of a two-layer neural network.
We show that, depending on the ground-truth function, the test error of overfitted NTK models exhibits characteristics that are different from the "double-descent"
For functions outside of this class, we provide a lower bound on the generalization error that does not diminish to zero even when $n$ and $p$ are both large.
arXiv Detail & Related papers (2021-03-09T06:24:59Z) - On Function Approximation in Reinforcement Learning: Optimism in the
Face of Large State Spaces [208.67848059021915]
We study the exploration-exploitation tradeoff at the core of reinforcement learning.
In particular, we prove that the complexity of the function class $mathcalF$ characterizes the complexity of the function.
Our regret bounds are independent of the number of episodes.
arXiv Detail & Related papers (2020-11-09T18:32:22Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.