Related papers: Online Learning for the Random Feature Model in the Student-Teacher Framework

Online Learning for the Random Feature Model in the Student-Teacher Framework

URL: http://arxiv.org/abs/2303.14083v2
Date: Thu, 6 Apr 2023 21:41:46 GMT
Title: Online Learning for the Random Feature Model in the Student-Teacher Framework
Authors: Roman Worschech and Bernd Rosenow
Abstract summary: We study over-parametrization in the context of a student-teacher framework. For any finite ratio of hidden layer size and input dimension, the student cannot generalize perfectly. Only when the student's hidden layer size is exponentially larger than the input dimension, an approach to perfect generalization is possible.
Score: 0.0
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Deep neural networks are widely used prediction algorithms whose performance often improves as the number of weights increases, leading to over-parametrization. We consider a two-layered neural network whose first layer is frozen while the last layer is trainable, known as the random feature model. We study over-parametrization in the context of a student-teacher framework by deriving a set of differential equations for the learning dynamics. For any finite ratio of hidden layer size and input dimension, the student cannot generalize perfectly, and we compute the non-zero asymptotic generalization error. Only when the student's hidden layer size is exponentially larger than the input dimension, an approach to perfect generalization is possible.

Related papers

Generalization performance of narrow one-hidden layer networks in the teacher-student setting [40.69556943879117]
We develop a general theory for narrow networks, i.e. networks with a large number of hidden units, yet much smaller than the input dimension.<n>Our theory accurately predicts the generalization error of neural networks trained on regression or classification tasks.
arXiv Detail & Related papers (2025-07-01T10:18:20Z)
Optimal generalisation and learning transition in extensive-width shallow neural networks near interpolation [4.976898227858662]
We consider a teacher-student model of supervised learning with a fully-trained two-layer neural network. We provide an effective theory for approximating the Bayes-optimal generalisation error of the network for any activation function.
arXiv Detail & Related papers (2025-01-30T17:56:52Z)
Towards Scalable and Versatile Weight Space Learning [51.78426981947659]
This paper introduces the SANE approach to weight-space learning. Our method extends the idea of hyper-representations towards sequential processing of subsets of neural network weights.
arXiv Detail & Related papers (2024-06-14T13:12:07Z)
Asymptotics of Learning with Deep Structured (Random) Features [9.366617422860543]
For a large class of feature maps we provide a tight characterisation of the test error associated with learning the readout layer. In some cases our results can capture feature maps learned by deep, finite-width neural networks trained under gradient descent.
arXiv Detail & Related papers (2024-02-21T18:35:27Z)
Multi-Grade Deep Learning [3.0069322256338906]
Current deep learning model is of a single-grade neural network. We propose a multi-grade learning model that enables us to learn deep neural network much more effectively and efficiently.
arXiv Detail & Related papers (2023-02-01T00:09:56Z)
Excess Risk of Two-Layer ReLU Neural Networks in Teacher-Student Settings and its Superiority to Kernel Methods [58.44819696433327]
We investigate the risk of two-layer ReLU neural networks in a teacher regression model. We find that the student network provably outperforms any solution methods.
arXiv Detail & Related papers (2022-05-30T02:51:36Z)
Contrasting random and learned features in deep Bayesian linear regression [12.234742322758418]
We study how the ability to learn affects the generalization performance of a simple class of models. By comparing deep random feature models to deep networks in which all layers are trained, we provide a detailed characterization of the interplay between width, depth, data density, and prior mismatch.
arXiv Detail & Related papers (2022-03-01T15:51:29Z)
Why Lottery Ticket Wins? A Theoretical Perspective of Sample Complexity on Pruned Neural Networks [79.74580058178594]
We analyze the performance of training a pruned neural network by analyzing the geometric structure of the objective function. We show that the convex region near a desirable model with guaranteed generalization enlarges as the neural network model is pruned.
arXiv Detail & Related papers (2021-10-12T01:11:07Z)
The Separation Capacity of Random Neural Networks [78.25060223808936]
We show that a sufficiently large two-layer ReLU-network with standard Gaussian weights and uniformly distributed biases can solve this problem with high probability. We quantify the relevant structure of the data in terms of a novel notion of mutual complexity.
arXiv Detail & Related papers (2021-07-31T10:25:26Z)
Redundant representations help generalization in wide neural networks [71.38860635025907]
We study the last hidden layer representations of various state-of-the-art convolutional neural networks. We find that if the last hidden representation is wide enough, its neurons tend to split into groups that carry identical information, and differ from each other only by statistically independent noise.
arXiv Detail & Related papers (2021-06-07T10:18:54Z)
Towards an Understanding of Benign Overfitting in Neural Networks [104.2956323934544]
Modern machine learning models often employ a huge number of parameters and are typically optimized to have zero training loss. We examine how these benign overfitting phenomena occur in a two-layer neural network setting. We show that it is possible for the two-layer ReLU network interpolator to achieve a near minimax-optimal learning rate.
arXiv Detail & Related papers (2021-06-06T19:08:53Z)
How Powerful are Shallow Neural Networks with Bandlimited Random Weights? [25.102870584507244]
We investigate the expressive power of limited depth-2 band random neural networks. A random net is a neural network where the hidden layer parameters are frozen with random bandwidth.
arXiv Detail & Related papers (2020-08-19T13:26:12Z)

This list is automatically generated from the titles and abstracts of the papers in this site.