Related papers: The Gaussian equivalence of generative models for learning with shallow neural networks

The Gaussian equivalence of generative models for learning with shallow neural networks

URL: http://arxiv.org/abs/2006.14709v3
Date: Fri, 21 May 2021 13:21:00 GMT
Title: The Gaussian equivalence of generative models for learning with shallow neural networks
Authors: Sebastian Goldt, Bruno Loureiro, Galen Reeves, Florent Krzakala, Marc M\'ezard, Lenka Zdeborov\'a
Abstract summary: We study the performance of neural networks trained on data drawn from pre-trained generative models. We provide three strands of rigorous, analytical and numerical evidence corroborating this equivalence. These results open a viable path to the theoretical study of machine learning models with realistic data.
Score: 30.47878306277163
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Understanding the impact of data structure on the computational tractability of learning is a key challenge for the theory of neural networks. Many theoretical works do not explicitly model training data, or assume that inputs are drawn component-wise independently from some simple probability distribution. Here, we go beyond this simple paradigm by studying the performance of neural networks trained on data drawn from pre-trained generative models. This is possible due to a Gaussian equivalence stating that the key metrics of interest, such as the training and test errors, can be fully captured by an appropriately chosen Gaussian model. We provide three strands of rigorous, analytical and numerical evidence corroborating this equivalence. First, we establish rigorous conditions for the Gaussian equivalence to hold in the case of single-layer generative models, as well as deterministic rates for convergence in distribution. Second, we leverage this equivalence to derive a closed set of equations describing the generalisation performance of two widely studied machine learning problems: two-layer neural networks trained using one-pass stochastic gradient descent, and full-batch pre-learned features or kernel methods. Finally, we perform experiments demonstrating how our theory applies to deep, pre-trained generative models. These results open a viable path to the theoretical study of machine learning models with realistic data.

Related papers

Discovering uncertainty: Gaussian constitutive neural networks with correlated weights [0.0]
We introduce a more interpretable network with fewer parameters, simpler training, and the potential to discover correlated weights. Importantly, the discovered distributions of material parameters across a set of samples can serve as priors to discover better models for new samples with limited data.
arXiv Detail & Related papers (2025-03-16T22:34:16Z)
Towards Theoretical Understandings of Self-Consuming Generative Models [56.84592466204185]
This paper tackles the emerging challenge of training generative models within a self-consuming loop. We construct a theoretical framework to rigorously evaluate how this training procedure impacts the data distributions learned by future models. We present results for kernel density estimation, delivering nuanced insights such as the impact of mixed data training on error propagation.
arXiv Detail & Related papers (2024-02-19T02:08:09Z)
A Metalearned Neural Circuit for Nonparametric Bayesian Inference [4.767884267554628]
Most applications of machine learning to classification assume a closed set of balanced classes. This is at odds with the real world, where class occurrence statistics often follow a long-tailed power-law distribution. We present a method for extracting the inductive bias from a nonparametric Bayesian model and transferring it to an artificial neural network.
arXiv Detail & Related papers (2023-11-24T16:43:17Z)
Diffusion-Model-Assisted Supervised Learning of Generative Models for Density Estimation [10.793646707711442]
We present a framework for training generative models for density estimation. We use the score-based diffusion model to generate labeled data. Once the labeled data are generated, we can train a simple fully connected neural network to learn the generative model in the supervised manner.
arXiv Detail & Related papers (2023-10-22T23:56:19Z)
Fundamental limits of overparametrized shallow neural networks for supervised learning [11.136777922498355]
We study a two-layer neural network trained from input-output pairs generated by a teacher network with matching architecture. Our results come in the form of bounds relating i) the mutual information between training data and network weights, or ii) the Bayes-optimal generalization error.
arXiv Detail & Related papers (2023-07-11T08:30:50Z)
An unfolding method based on conditional Invertible Neural Networks (cINN) using iterative training [0.0]
Generative networks like invertible neural networks(INN) enable a probabilistic unfolding. We introduce the iterative conditional INN(IcINN) for unfolding that adjusts for deviations between simulated training samples and data.
arXiv Detail & Related papers (2022-12-16T19:00:05Z)
Data-driven emergence of convolutional structure in neural networks [83.4920717252233]
We show how fully-connected neural networks solving a discrimination task can learn a convolutional structure directly from their inputs. By carefully designing data models, we show that the emergence of this pattern is triggered by the non-Gaussian, higher-order local structure of the inputs.
arXiv Detail & Related papers (2022-02-01T17:11:13Z)
Multi-scale Feature Learning Dynamics: Insights for Double Descent [71.91871020059857]
We study the phenomenon of "double descent" of the generalization error. We find that double descent can be attributed to distinct features being learned at different scales.
arXiv Detail & Related papers (2021-12-06T18:17:08Z)
A Bayesian Perspective on Training Speed and Model Selection [51.15664724311443]
We show that a measure of a model's training speed can be used to estimate its marginal likelihood. We verify our results in model selection tasks for linear models and for the infinite-width limit of deep neural networks. Our results suggest a promising new direction towards explaining why neural networks trained with gradient descent are biased towards functions that generalize well.
arXiv Detail & Related papers (2020-10-27T17:56:14Z)
Theoretical Analysis of Self-Training with Deep Networks on Unlabeled Data [48.4779912667317]
Self-training algorithms have been very successful for learning with unlabeled data using neural networks. This work provides a unified theoretical analysis of self-training with deep networks for semi-supervised learning, unsupervised domain adaptation, and unsupervised learning.
arXiv Detail & Related papers (2020-10-07T19:43:55Z)
Good Classifiers are Abundant in the Interpolating Regime [64.72044662855612]
We develop a methodology to compute precisely the full distribution of test errors among interpolating classifiers. We find that test errors tend to concentrate around a small typical value $varepsilon*$, which deviates substantially from the test error of worst-case interpolating model. Our results show that the usual style of analysis in statistical learning theory may not be fine-grained enough to capture the good generalization performance observed in practice.
arXiv Detail & Related papers (2020-06-22T21:12:31Z)
Learning Queuing Networks by Recurrent Neural Networks [0.0]
We propose a machine-learning approach to derive performance models from data. We exploit a deterministic approximation of their average dynamics in terms of a compact system of ordinary differential equations. This allows for an interpretable structure of the neural network, which can be trained from system measurements to yield a white-box parameterized model.
arXiv Detail & Related papers (2020-02-25T10:56:47Z)

This list is automatically generated from the titles and abstracts of the papers in this site.