Online Learning for the Random Feature Model in the Student-Teacher
Framework
- URL: http://arxiv.org/abs/2303.14083v2
- Date: Thu, 6 Apr 2023 21:41:46 GMT
- Title: Online Learning for the Random Feature Model in the Student-Teacher
Framework
- Authors: Roman Worschech and Bernd Rosenow
- Abstract summary: We study over-parametrization in the context of a student-teacher framework.
For any finite ratio of hidden layer size and input dimension, the student cannot generalize perfectly.
Only when the student's hidden layer size is exponentially larger than the input dimension, an approach to perfect generalization is possible.
- Score: 0.0
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Deep neural networks are widely used prediction algorithms whose performance
often improves as the number of weights increases, leading to
over-parametrization. We consider a two-layered neural network whose first
layer is frozen while the last layer is trainable, known as the random feature
model. We study over-parametrization in the context of a student-teacher
framework by deriving a set of differential equations for the learning
dynamics. For any finite ratio of hidden layer size and input dimension, the
student cannot generalize perfectly, and we compute the non-zero asymptotic
generalization error. Only when the student's hidden layer size is
exponentially larger than the input dimension, an approach to perfect
generalization is possible.
Related papers
- Towards Scalable and Versatile Weight Space Learning [51.78426981947659]
This paper introduces the SANE approach to weight-space learning.
Our method extends the idea of hyper-representations towards sequential processing of subsets of neural network weights.
arXiv Detail & Related papers (2024-06-14T13:12:07Z) - Asymptotics of Learning with Deep Structured (Random) Features [9.366617422860543]
For a large class of feature maps we provide a tight characterisation of the test error associated with learning the readout layer.
In some cases our results can capture feature maps learned by deep, finite-width neural networks trained under gradient descent.
arXiv Detail & Related papers (2024-02-21T18:35:27Z) - Multi-Grade Deep Learning [3.0069322256338906]
Current deep learning model is of a single-grade neural network.
We propose a multi-grade learning model that enables us to learn deep neural network much more effectively and efficiently.
arXiv Detail & Related papers (2023-02-01T00:09:56Z) - Excess Risk of Two-Layer ReLU Neural Networks in Teacher-Student
Settings and its Superiority to Kernel Methods [58.44819696433327]
We investigate the risk of two-layer ReLU neural networks in a teacher regression model.
We find that the student network provably outperforms any solution methods.
arXiv Detail & Related papers (2022-05-30T02:51:36Z) - Contrasting random and learned features in deep Bayesian linear
regression [12.234742322758418]
We study how the ability to learn affects the generalization performance of a simple class of models.
By comparing deep random feature models to deep networks in which all layers are trained, we provide a detailed characterization of the interplay between width, depth, data density, and prior mismatch.
arXiv Detail & Related papers (2022-03-01T15:51:29Z) - Why Lottery Ticket Wins? A Theoretical Perspective of Sample Complexity
on Pruned Neural Networks [79.74580058178594]
We analyze the performance of training a pruned neural network by analyzing the geometric structure of the objective function.
We show that the convex region near a desirable model with guaranteed generalization enlarges as the neural network model is pruned.
arXiv Detail & Related papers (2021-10-12T01:11:07Z) - The Separation Capacity of Random Neural Networks [78.25060223808936]
We show that a sufficiently large two-layer ReLU-network with standard Gaussian weights and uniformly distributed biases can solve this problem with high probability.
We quantify the relevant structure of the data in terms of a novel notion of mutual complexity.
arXiv Detail & Related papers (2021-07-31T10:25:26Z) - Redundant representations help generalization in wide neural networks [71.38860635025907]
We study the last hidden layer representations of various state-of-the-art convolutional neural networks.
We find that if the last hidden representation is wide enough, its neurons tend to split into groups that carry identical information, and differ from each other only by statistically independent noise.
arXiv Detail & Related papers (2021-06-07T10:18:54Z) - Towards an Understanding of Benign Overfitting in Neural Networks [104.2956323934544]
Modern machine learning models often employ a huge number of parameters and are typically optimized to have zero training loss.
We examine how these benign overfitting phenomena occur in a two-layer neural network setting.
We show that it is possible for the two-layer ReLU network interpolator to achieve a near minimax-optimal learning rate.
arXiv Detail & Related papers (2021-06-06T19:08:53Z) - How Powerful are Shallow Neural Networks with Bandlimited Random
Weights? [25.102870584507244]
We investigate the expressive power of limited depth-2 band random neural networks.
A random net is a neural network where the hidden layer parameters are frozen with random bandwidth.
arXiv Detail & Related papers (2020-08-19T13:26:12Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.