Capturing the learning curves of generic features maps for realistic
data sets with a teacher-student model
- URL: http://arxiv.org/abs/2102.08127v1
- Date: Tue, 16 Feb 2021 12:49:15 GMT
- Title: Capturing the learning curves of generic features maps for realistic
data sets with a teacher-student model
- Authors: Bruno Loureiro, C\'edric Gerbelot, Hugo Cui, Sebastian Goldt, Florent
Krzakala, Marc M\'ezard, Lenka Zdeborov\'a
- Abstract summary: Teacher-student models provide a powerful framework in which the typical case performance of high-dimensional supervised learning tasks can be studied in closed form.
In this setting, labels are assigned to data - often taken to be Gaussian i.i.d. - by a teacher model, and the goal is to characterise the typical performance of the student model in recovering the parameters that generated the labels.
- Score: 24.679669970832396
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Teacher-student models provide a powerful framework in which the typical case
performance of high-dimensional supervised learning tasks can be studied in
closed form. In this setting, labels are assigned to data - often taken to be
Gaussian i.i.d. - by a teacher model, and the goal is to characterise the
typical performance of the student model in recovering the parameters that
generated the labels. In this manuscript we discuss a generalisation of this
setting where the teacher and student can act on different spaces, generated
with fixed, but generic feature maps. This is achieved via the rigorous study
of a high-dimensional Gaussian covariate model. Our contribution is two-fold:
First, we prove a rigorous formula for the asymptotic training loss and
generalisation error achieved by empirical risk minimization for this model.
Second, we present a number of situations where the learning curve of the model
captures the one of a \emph{realistic data set} learned with kernel regression
and classification, with out-of-the-box feature maps such as random projections
or scattering transforms, or with pre-learned ones - such as the features
learned by training multi-layer neural networks. We discuss both the power and
the limitations of the Gaussian teacher-student framework as a typical case
analysis capturing learning curves as encountered in practice on real data
sets.
Related papers
- Scaling and renormalization in high-dimensional regression [72.59731158970894]
This paper presents a succinct derivation of the training and generalization performance of a variety of high-dimensional ridge regression models.
We provide an introduction and review of recent results on these topics, aimed at readers with backgrounds in physics and deep learning.
arXiv Detail & Related papers (2024-05-01T15:59:00Z) - A unified framework for learning with nonlinear model classes from
arbitrary linear samples [0.7366405857677226]
This work considers the fundamental problem of learning an unknown object from training data using a given model class.
We introduce a unified framework that allows for objects in arbitrary Hilbert spaces, general types of (random) linear measurements as training data and general types of nonlinear model classes.
We present examples such as matrix sketching by random sampling, compressed sensing with isotropic vectors, active learning in regression and compressed sensing with generative models.
arXiv Detail & Related papers (2023-11-25T00:43:22Z) - Generalization Properties of Retrieval-based Models [50.35325326050263]
Retrieval-based machine learning methods have enjoyed success on a wide range of problems.
Despite growing literature showcasing the promise of these models, the theoretical underpinning for such models remains underexplored.
We present a formal treatment of retrieval-based models to characterize their generalization ability.
arXiv Detail & Related papers (2022-10-06T00:33:01Z) - Learning and generalization of one-hidden-layer neural networks, going
beyond standard Gaussian data [14.379261299138147]
This paper analyzes the convergence and iterations of a one-hidden-layer neural network when the input features follow the Gaussian mixture model.
For the first time, this paper characterizes the impact of the input distributions on the sample and the learning rate.
arXiv Detail & Related papers (2022-07-07T23:27:44Z) - Smoothed Embeddings for Certified Few-Shot Learning [63.68667303948808]
We extend randomized smoothing to few-shot learning models that map inputs to normalized embeddings.
Our results are confirmed by experiments on different datasets.
arXiv Detail & Related papers (2022-02-02T18:19:04Z) - X-model: Improving Data Efficiency in Deep Learning with A Minimax Model [78.55482897452417]
We aim at improving data efficiency for both classification and regression setups in deep learning.
To take the power of both worlds, we propose a novel X-model.
X-model plays a minimax game between the feature extractor and task-specific heads.
arXiv Detail & Related papers (2021-10-09T13:56:48Z) - Goal-directed Generation of Discrete Structures with Conditional
Generative Models [85.51463588099556]
We introduce a novel approach to directly optimize a reinforcement learning objective, maximizing an expected reward.
We test our methodology on two tasks: generating molecules with user-defined properties and identifying short python expressions which evaluate to a given target value.
arXiv Detail & Related papers (2020-10-05T20:03:13Z) - The Gaussian equivalence of generative models for learning with shallow
neural networks [30.47878306277163]
We study the performance of neural networks trained on data drawn from pre-trained generative models.
We provide three strands of rigorous, analytical and numerical evidence corroborating this equivalence.
These results open a viable path to the theoretical study of machine learning models with realistic data.
arXiv Detail & Related papers (2020-06-25T21:20:09Z) - Good Classifiers are Abundant in the Interpolating Regime [64.72044662855612]
We develop a methodology to compute precisely the full distribution of test errors among interpolating classifiers.
We find that test errors tend to concentrate around a small typical value $varepsilon*$, which deviates substantially from the test error of worst-case interpolating model.
Our results show that the usual style of analysis in statistical learning theory may not be fine-grained enough to capture the good generalization performance observed in practice.
arXiv Detail & Related papers (2020-06-22T21:12:31Z) - Generalisation error in learning with random features and the hidden
manifold model [23.71637173968353]
We study generalised linear regression and classification for a synthetically generated dataset.
We consider the high-dimensional regime and using the replica method from statistical physics.
We show how to obtain the so-called double descent behaviour for logistic regression with a peak at the threshold.
We discuss the role played by correlations in the data generated by the hidden manifold model.
arXiv Detail & Related papers (2020-02-21T14:49:41Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.