Fundamental operating regimes, hyper-parameter fine-tuning and glassiness: towards an interpretable replica-theory for trained restricted Boltzmann machines
- URL: http://arxiv.org/abs/2406.09924v1
- Date: Fri, 14 Jun 2024 11:12:00 GMT
- Title: Fundamental operating regimes, hyper-parameter fine-tuning and glassiness: towards an interpretable replica-theory for trained restricted Boltzmann machines
- Authors: Alberto Fachechi, Elena Agliari, Miriam Aquaro, Anthony Coolen, Menno Mulder,
- Abstract summary: We consider Boltzmann machines with a binary visible layer and a Gaussian hidden layer trained by an unlabelled dataset composed of noisy realizations of a single ground pattern.
We develop a statistical mechanics framework to describe the network generative capabilities, by exploiting the replica trick and assuming self-averaging of the underlying order parameters.
- Score: 0.0
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: We consider restricted Boltzmann machines with a binary visible layer and a Gaussian hidden layer trained by an unlabelled dataset composed of noisy realizations of a single ground pattern. We develop a statistical mechanics framework to describe the network generative capabilities, by exploiting the replica trick and assuming self-averaging of the underlying order parameters (i.e., replica symmetry). In particular, we outline the effective control parameters (e.g., the relative number of weights to be trained, the regularization parameter), whose tuning can yield qualitatively-different operative regimes. Further, we provide analytical and numerical evidence for the existence of a sub-region in the space of the hyperparameters where replica-symmetry breaking occurs.
Related papers
- Nonparametric Control Koopman Operators [3.9393118740111084]
This paper presents a novel Koopman (composition) operator representation framework for control systems in reproducing kernel Hilbert spaces (RKHSs) that is free of explicit dictionary or input parametrizations.
By establishing fundamental equivalences between different model representations, we are able to close the gap of control system operator learning and infinite-dimensional regression.
arXiv Detail & Related papers (2024-05-12T15:46:52Z) - Variational Inference of Parameters in Opinion Dynamics Models [9.51311391391997]
This work uses variational inference to estimate the parameters of an opinion dynamics ABM.
We transform the inference process into an optimization problem suitable for automatic differentiation.
Our approach estimates both macroscopic (bounded confidence intervals and backfire thresholds) and microscopic ($200$ categorical, agent-level roles) more accurately than simulation-based and MCMC methods.
arXiv Detail & Related papers (2024-03-08T14:45:18Z) - Engineering Hierarchical Symmetries [0.0]
We present a program for the generation of sequences of symmetries on controllable timescales.
We provide explicit examples including symmetry andtemporal topological phenomena, as well as a spin chain realizing the symmetry ladder.
Our results have direct applications in experiments with quantum simulators.
arXiv Detail & Related papers (2024-02-21T04:09:23Z) - Unsupervised and Supervised learning by Dense Associative Memory under
replica symmetry breaking [0.24999074238880487]
Hebbian attractor networks with multi-node interactions have been shown to outperform classical pairwise counterparts in a number of tasks.
We derive the one-step broken-replica-symmetry picture of supervised and unsupervised learning protocols for these Associative Memories.
arXiv Detail & Related papers (2023-12-15T09:27:46Z) - Learning minimal representations of stochastic processes with
variational autoencoders [52.99137594502433]
We introduce an unsupervised machine learning approach to determine the minimal set of parameters required to describe a process.
Our approach enables for the autonomous discovery of unknown parameters describing processes.
arXiv Detail & Related papers (2023-07-21T14:25:06Z) - Identifying overparameterization in Quantum Circuit Born Machines [1.7259898169307613]
We study the onset of over parameterization transitions for quantum circuit Born machines, generative models that are trained using non-adversarial gradient methods.
Our results indicate that fully understanding the trainability of these models remains an open question.
arXiv Detail & Related papers (2023-07-06T21:05:22Z) - Least Squares Regression Can Exhibit Under-Parameterized Double Descent [6.645111950779666]
We study the relationship between the number of training data points, the number of parameters, and the generalization capabilities of models.
We postulate that the location of the peak depends on the technical properties of both the spectrum as well as the eigenvectors of the sample covariance.
arXiv Detail & Related papers (2023-05-24T03:52:48Z) - On the Forward Invariance of Neural ODEs [92.07281135902922]
We propose a new method to ensure neural ordinary differential equations (ODEs) satisfy output specifications.
Our approach uses a class of control barrier functions to transform output specifications into constraints on the parameters and inputs of the learning system.
arXiv Detail & Related papers (2022-10-10T15:18:28Z) - Analyzing Transformers in Embedding Space [59.434807802802105]
We present a theoretical analysis where all parameters of a trained Transformer are interpreted by projecting them into the embedding space.
We show that parameters of both pretrained and fine-tuned models can be interpreted in embedding space.
Our findings open the door to interpretation methods that, at least in part, abstract away from model specifics and operate in the embedding space only.
arXiv Detail & Related papers (2022-09-06T14:36:57Z) - Arbitrary Marginal Neural Ratio Estimation for Simulation-based
Inference [7.888755225607877]
We present a novel method that enables amortized inference over arbitrary subsets of the parameters, without resorting to numerical integration.
We demonstrate the applicability of the method on parameter inference of binary black hole systems from gravitational waves observations.
arXiv Detail & Related papers (2021-10-01T14:35:46Z) - Spectral Tensor Train Parameterization of Deep Learning Layers [136.4761580842396]
We study low-rank parameterizations of weight matrices with embedded spectral properties in the Deep Learning context.
We show the effects of neural network compression in the classification setting and both compression and improved stability training in the generative adversarial training setting.
arXiv Detail & Related papers (2021-03-07T00:15:44Z) - Sampling asymmetric open quantum systems for artificial neural networks [77.34726150561087]
We present a hybrid sampling strategy which takes asymmetric properties explicitly into account, achieving fast convergence times and high scalability for asymmetric open systems.
We highlight the universal applicability of artificial neural networks, underlining the universal applicability of neural networks.
arXiv Detail & Related papers (2020-12-20T18:25:29Z) - On the Sparsity of Neural Machine Translation Models [65.49762428553345]
We investigate whether redundant parameters can be reused to achieve better performance.
Experiments and analyses are systematically conducted on different datasets and NMT architectures.
arXiv Detail & Related papers (2020-10-06T11:47:20Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.