Related papers: Emergence of Structure in Ensembles of Random Neural Networks

Emergence of Structure in Ensembles of Random Neural Networks

URL: http://arxiv.org/abs/2505.10331v1
Date: Thu, 15 May 2025 14:20:02 GMT
Title: Emergence of Structure in Ensembles of Random Neural Networks
Authors: Luca Muscarnera, Luigi Loreti, Giovanni Todeschini, Alessio Fumagalli, Francesco Regazzoni,
Abstract summary: We introduce a theoretical model for studying the emergence of collective behaviors in ensembles of random classifiers.<n>Experiments on the MNIST dataset underline the relevance of this phenomenon in high-quality, noiseless, datasets.
Score: 3.3385430106181184
License: http://creativecommons.org/licenses/by-nc-nd/4.0/
Abstract: Randomness is ubiquitous in many applications across data science and machine learning. Remarkably, systems composed of random components often display emergent global behaviors that appear deterministic, manifesting a transition from microscopic disorder to macroscopic organization. In this work, we introduce a theoretical model for studying the emergence of collective behaviors in ensembles of random classifiers. We argue that, if the ensemble is weighted through the Gibbs measure defined by adopting the classification loss as an energy, then there exists a finite temperature parameter for the distribution such that the classification is optimal, with respect to the loss (or the energy). Interestingly, for the case in which samples are generated by a Gaussian distribution and labels are constructed by employing a teacher perceptron, we analytically prove and numerically confirm that such optimal temperature does not depend neither on the teacher classifier (which is, by construction of the learning problem, unknown), nor on the number of random classifiers, highlighting the universal nature of the observed behavior. Experiments on the MNIST dataset underline the relevance of this phenomenon in high-quality, noiseless, datasets. Finally, a physical analogy allows us to shed light on the self-organizing nature of the studied phenomenon.

Related papers

Free Independence and Unitary Design from Random Matrix Product Unitaries [0.0]
We study the emergence of freeness from a random matrix product unitary ensemble.<n>We prove that, with only bond dimension, these unitaries reproduce Haar values of higher-order OTOCs for local, finite-trace observables.<n>Our results highlight the need to refine previous notions of unitary designs in the context of operator dynamics.
arXiv Detail & Related papers (2025-07-31T18:00:00Z)
Rademacher learning rates for iterated random functions [0.0]
We consider the case where the training dataset is generated by an iterated random function that is not necessarily irreducible or aperiodic.<n>Under the assumption that the governing function is contractive with respect to its first argument, we first establish a uniform convergence result for the corresponding sample error.<n>We then demonstrate the learnability of the approximate empirical risk minimization algorithm and derive its learning rate bound.
arXiv Detail & Related papers (2025-06-16T19:36:13Z)
Identifiable Latent Neural Causal Models [82.14087963690561]
Causal representation learning seeks to uncover latent, high-level causal representations from low-level observed data. We determine the types of distribution shifts that do contribute to the identifiability of causal representations. We translate our findings into a practical algorithm, allowing for the acquisition of reliable latent causal representations.
arXiv Detail & Related papers (2024-03-23T04:13:55Z)
Learning Linear Causal Representations from Interventions under General Nonlinear Mixing [52.66151568785088]
We prove strong identifiability results given unknown single-node interventions without access to the intervention targets. This is the first instance of causal identifiability from non-paired interventions for deep neural network embeddings.
arXiv Detail & Related papers (2023-06-04T02:32:12Z)
DIFFormer: Scalable (Graph) Transformers Induced by Energy Constrained Diffusion [66.21290235237808]
We introduce an energy constrained diffusion model which encodes a batch of instances from a dataset into evolutionary states. We provide rigorous theory that implies closed-form optimal estimates for the pairwise diffusion strength among arbitrary instance pairs. Experiments highlight the wide applicability of our model as a general-purpose encoder backbone with superior performance in various tasks.
arXiv Detail & Related papers (2023-01-23T15:18:54Z)
Characterization and Greedy Learning of Gaussian Structural Causal Models under Unknown Interventions [4.993565079216378]
We use a greedy algorithm called GnIES to recover the equivalence class of the data-generating model without knowledge of the intervention targets.<n>In addition, we develop a novel procedure to generate semi-synthetic data sets with known causal ground truth but distributions closely resembling those of a real data set of choice.
arXiv Detail & Related papers (2022-11-27T17:37:21Z)
Identifiability and Asymptotics in Learning Homogeneous Linear ODE Systems from Discrete Observations [114.17826109037048]
Ordinary Differential Equations (ODEs) have recently gained a lot of attention in machine learning. theoretical aspects, e.g., identifiability and properties of statistical estimation are still obscure. This paper derives a sufficient condition for the identifiability of homogeneous linear ODE systems from a sequence of equally-spaced error-free observations sampled from a single trajectory.
arXiv Detail & Related papers (2022-10-12T06:46:38Z)
Gaussian Universality of Linear Classifiers with Random Labels in High-Dimension [24.503842578208268]
We prove that data coming from a range of generative models in high-dimensions have the same minimum training loss as Gaussian data with corresponding data covariance. In particular, our theorem covers data created by an arbitrary mixture of homogeneous Gaussian clouds, as well as multi-modal generative neural networks.
arXiv Detail & Related papers (2022-05-26T12:25:24Z)
Fluctuations, Bias, Variance & Ensemble of Learners: Exact Asymptotics for Convex Losses in High-Dimension [25.711297863946193]
We develop a theory for the study of fluctuations in an ensemble of generalised linear models trained on different, but correlated, features. We provide a complete description of the joint distribution of the empirical risk minimiser for generic convex loss and regularisation in the high-dimensional limit.
arXiv Detail & Related papers (2022-01-31T17:44:58Z)
Uniform Convergence, Adversarial Spheres and a Simple Remedy [40.44709296304123]
Previous work has cast doubt on the general framework of uniform convergence and its ability to explain generalization in neural networks. We provide an extensive theoretical investigation of the previously studied data setting through the lens of infinitely-wide models. We prove that the Neural Tangent Kernel (NTK) also suffers from the same phenomenon and we uncover its origin.
arXiv Detail & Related papers (2021-05-07T20:23:01Z)
Leveraging Global Parameters for Flow-based Neural Posterior Estimation [90.21090932619695]
Inferring the parameters of a model based on experimental observations is central to the scientific method. A particularly challenging setting is when the model is strongly indeterminate, i.e., when distinct sets of parameters yield identical observations. We present a method for cracking such indeterminacy by exploiting additional information conveyed by an auxiliary set of observations sharing global parameters.
arXiv Detail & Related papers (2021-02-12T12:23:13Z)
Good Classifiers are Abundant in the Interpolating Regime [64.72044662855612]
We develop a methodology to compute precisely the full distribution of test errors among interpolating classifiers. We find that test errors tend to concentrate around a small typical value $varepsilon*$, which deviates substantially from the test error of worst-case interpolating model. Our results show that the usual style of analysis in statistical learning theory may not be fine-grained enough to capture the good generalization performance observed in practice.
arXiv Detail & Related papers (2020-06-22T21:12:31Z)
Neural Networks and Polynomial Regression. Demystifying the Overparametrization Phenomena [17.205106391379026]
In the context of neural network models, overparametrization refers to the phenomena whereby these models appear to generalize well on the unseen data. A conventional explanation of this phenomena is based on self-regularization properties of algorithms used to train the data. We show that any student network interpolating the data generated by a teacher network generalizes well, provided that the sample size is at least an explicit quantity controlled by data dimension.
arXiv Detail & Related papers (2020-03-23T20:09:31Z)

This list is automatically generated from the titles and abstracts of the papers in this site.