Gaussian Universality in Neural Network Dynamics with Generalized Structured Input Distributions
- URL: http://arxiv.org/abs/2405.00642v2
- Date: Thu, 31 Oct 2024 04:23:15 GMT
- Title: Gaussian Universality in Neural Network Dynamics with Generalized Structured Input Distributions
- Authors: Jaeyong Bae, Hawoong Jeong,
- Abstract summary: We analyze the behavior of a deep learning system trained on inputs modeled as Gaussian mixtures to better simulate more general structured inputs.
Under certain standardization schemes, the deep learning model converges toward Gaussian setting behavior, even when the input data follow more complex or real-world distributions.
- Score: 2.3020018305241337
- License:
- Abstract: Bridging the gap between the practical performance of deep learning and its theoretical foundations often involves analyzing neural networks through stochastic gradient descent (SGD). Expanding on previous research that focused on modeling structured inputs under a simple Gaussian setting, we analyze the behavior of a deep learning system trained on inputs modeled as Gaussian mixtures to better simulate more general structured inputs. Through empirical analysis and theoretical investigation, we demonstrate that under certain standardization schemes, the deep learning model converges toward Gaussian setting behavior, even when the input data follow more complex or real-world distributions. This finding exhibits a form of universality in which diverse structured distributions yield results consistent with Gaussian assumptions, which can support the theoretical understanding of deep learning models.
Related papers
- Dynamic Post-Hoc Neural Ensemblers [55.15643209328513]
In this study, we explore employing neural networks as ensemble methods.
Motivated by the risk of learning low-diversity ensembles, we propose regularizing the model by randomly dropping base model predictions.
We demonstrate this approach lower bounds the diversity within the ensemble, reducing overfitting and improving generalization capabilities.
arXiv Detail & Related papers (2024-10-06T15:25:39Z) - A PAC-Bayesian Perspective on the Interpolating Information Criterion [54.548058449535155]
We show how a PAC-Bayes bound is obtained for a general class of models, characterizing factors which influence performance in the interpolating regime.
We quantify how the test error for overparameterized models achieving effectively zero training error depends on the quality of the implicit regularization imposed by e.g. the combination of model, parameter-initialization scheme.
arXiv Detail & Related papers (2023-11-13T01:48:08Z) - Sparsity-aware generalization theory for deep neural networks [12.525959293825318]
We present a new approach to analyzing generalization for deep feed-forward ReLU networks.
We show fundamental trade-offs between sparsity and generalization.
arXiv Detail & Related papers (2023-07-01T20:59:05Z) - Generalization and Estimation Error Bounds for Model-based Neural
Networks [78.88759757988761]
We show that the generalization abilities of model-based networks for sparse recovery outperform those of regular ReLU networks.
We derive practical design rules that allow to construct model-based networks with guaranteed high generalization.
arXiv Detail & Related papers (2023-04-19T16:39:44Z) - An Information-Theoretic Framework for Supervised Learning [22.280001450122175]
We propose a novel information-theoretic framework with its own notions of regret and sample complexity.
We study the sample complexity of learning from data generated by deep neural networks with ReLU activation units.
We conclude by corroborating our theoretical results with experimental analysis of random single-hidden-layer neural networks.
arXiv Detail & Related papers (2022-03-01T05:58:28Z) - Fractal Structure and Generalization Properties of Stochastic
Optimization Algorithms [71.62575565990502]
We prove that the generalization error of an optimization algorithm can be bounded on the complexity' of the fractal structure that underlies its generalization measure.
We further specialize our results to specific problems (e.g., linear/logistic regression, one hidden/layered neural networks) and algorithms.
arXiv Detail & Related papers (2021-06-09T08:05:36Z) - Out-of-Distribution Generalization in Kernel Regression [21.958028127426196]
We study generalization in kernel regression when the training and test distributions are different.
We identify an overlap matrix that quantifies the mismatch between distributions for a given kernel.
We develop procedures for optimizing training and test distributions for a given data budget to find best and worst case generalizations under the shift.
arXiv Detail & Related papers (2021-06-04T04:54:25Z) - The Gaussian equivalence of generative models for learning with shallow
neural networks [30.47878306277163]
We study the performance of neural networks trained on data drawn from pre-trained generative models.
We provide three strands of rigorous, analytical and numerical evidence corroborating this equivalence.
These results open a viable path to the theoretical study of machine learning models with realistic data.
arXiv Detail & Related papers (2020-06-25T21:20:09Z) - Extrapolatable Relational Reasoning With Comparators in Low-Dimensional
Manifolds [7.769102711230249]
We propose a neuroscience-inspired inductive-biased module that can be readily amalgamated with current neural network architectures.
We show that neural nets with this inductive bias achieve considerably better o.o.d generalisation performance for a range of relational reasoning tasks.
arXiv Detail & Related papers (2020-06-15T19:09:13Z) - Kernel and Rich Regimes in Overparametrized Models [69.40899443842443]
We show that gradient descent on overparametrized multilayer networks can induce rich implicit biases that are not RKHS norms.
We also demonstrate this transition empirically for more complex matrix factorization models and multilayer non-linear networks.
arXiv Detail & Related papers (2020-02-20T15:43:02Z) - Understanding Generalization in Deep Learning via Tensor Methods [53.808840694241]
We advance the understanding of the relations between the network's architecture and its generalizability from the compression perspective.
We propose a series of intuitive, data-dependent and easily-measurable properties that tightly characterize the compressibility and generalizability of neural networks.
arXiv Detail & Related papers (2020-01-14T22:26:57Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.