Related papers: Approximate Gaussianity Beyond Initialisation in Neural Networks

Approximate Gaussianity Beyond Initialisation in Neural Networks

URL: http://arxiv.org/abs/2510.05218v1
Date: Mon, 06 Oct 2025 18:00:46 GMT
Title: Approximate Gaussianity Beyond Initialisation in Neural Networks
Authors: Edward Hirst, Sanjaye Ramgoolam,
Abstract summary: We study Ensembles of neural network weight matrices through the training process for the MNIST classification problem.<n>Experiments include the effects of varied initialisation regimes, regularisation, layer depth, and layer width.
Score: 0.0954904463032233
License: http://creativecommons.org/licenses/by/4.0/
Abstract: Ensembles of neural network weight matrices are studied through the training process for the MNIST classification problem, testing the efficacy of matrix models for representing their distributions, under assumptions of Gaussianity and permutation-symmetry. The general 13-parameter permutation invariant Gaussian matrix models are found to be effective models for the correlated Gaussianity in the weight matrices, beyond the range of applicability of the simple Gaussian with independent identically distributed matrix variables, and notably well beyond the initialisation step. The representation theoretic model parameters, and the graph-theoretic characterisation of the permutation invariant matrix observables give an interpretable framework for the best-fit model and for small departures from Gaussianity. Additionally, the Wasserstein distance is calculated for this class of models and used to quantify the movement of the distributions over training. Throughout the work, the effects of varied initialisation regimes, regularisation, layer depth, and layer width are tested for this formalism, identifying limits where particular departures from Gaussianity are enhanced and how more general, yet still highly-interpretable, models can be developed.

Related papers

Fundamental Limits of Matrix Sensing: Exact Asymptotics, Universality, and Applications [34.781211988554425]
We provide rigorous equations characterizing the Bayes-optimal learning performance from a number of samples.<n>We mathematically establish predictions obtained via non-rigorous methods from statistical physics.
arXiv Detail & Related papers (2025-03-18T10:36:30Z)
Scaling and renormalization in high-dimensional regression [72.59731158970894]
We present a unifying perspective on recent results on ridge regression.<n>We use the basic tools of random matrix theory and free probability, aimed at readers with backgrounds in physics and deep learning.<n>Our results extend and provide a unifying perspective on earlier models of scaling laws.
arXiv Detail & Related papers (2024-05-01T15:59:00Z)
Probabilistic Unrolling: Scalable, Inverse-Free Maximum Likelihood Estimation for Latent Gaussian Models [69.22568644711113]
We introduce probabilistic unrolling, a method that combines Monte Carlo sampling with iterative linear solvers to circumvent matrix inversions. Our theoretical analyses reveal that unrolling and backpropagation through the iterations of the solver can accelerate gradient estimation for maximum likelihood estimation. In experiments on simulated and real data, we demonstrate that probabilistic unrolling learns latent Gaussian models up to an order of magnitude faster than gradient EM, with minimal losses in model performance.
arXiv Detail & Related papers (2023-06-05T21:08:34Z)
Manifold Gaussian Variational Bayes on the Precision Matrix [70.44024861252554]
We propose an optimization algorithm for Variational Inference (VI) in complex models. We develop an efficient algorithm for Gaussian Variational Inference whose updates satisfy the positive definite constraint on the variational covariance matrix. Due to its black-box nature, MGVBP stands as a ready-to-use solution for VI in complex models.
arXiv Detail & Related papers (2022-10-26T10:12:31Z)
Learning Graphical Factor Models with Riemannian Optimization [70.13748170371889]
This paper proposes a flexible algorithmic framework for graph learning under low-rank structural constraints. The problem is expressed as penalized maximum likelihood estimation of an elliptical distribution. We leverage geometries of positive definite matrices and positive semi-definite matrices of fixed rank that are well suited to elliptical models.
arXiv Detail & Related papers (2022-10-21T13:19:45Z)
Graph Polynomial Convolution Models for Node Classification of Non-Homophilous Graphs [52.52570805621925]
We investigate efficient learning from higher-order graph convolution and learning directly from adjacency matrix for node classification. We show that the resulting model lead to new graphs and residual scaling parameter. We demonstrate that the proposed methods obtain improved accuracy for node-classification of non-homophilous parameters.
arXiv Detail & Related papers (2022-09-12T04:46:55Z)
Understanding Implicit Regularization in Over-Parameterized Single Index Model [55.41685740015095]
We design regularization-free algorithms for the high-dimensional single index model. We provide theoretical guarantees for the induced implicit regularization phenomenon.
arXiv Detail & Related papers (2020-07-16T13:27:47Z)
Estimation of sparse Gaussian graphical models with hidden clustering structure [8.258451067861932]
We propose a model to estimate the sparse Gaussian graphical models with hidden clustering structure. We develop a symmetric Gauss-Seidel based alternating direction method of the multipliers. Numerical experiments on both synthetic data and real data demonstrate the good performance of our model.
arXiv Detail & Related papers (2020-04-17T08:43:31Z)

This list is automatically generated from the titles and abstracts of the papers in this site.

This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.