On the role of non-linear latent features in bipartite generative neural networks
- URL: http://arxiv.org/abs/2506.10552v1
- Date: Thu, 12 Jun 2025 10:20:20 GMT
- Title: On the role of non-linear latent features in bipartite generative neural networks
- Authors: Tony Bonnaire, Giovanni Catania, Aurélien Decelle, Beatriz Seoane,
- Abstract summary: We investigate Restricted Boltzmann Machines (RBMs) as a function of the prior distribution imposed on their hidden units.<n>Our analysis reveals that standard RBMs with binary hidden nodes and extensive connectivity suffer from reduced critical capacity.<n>To address this, we examine several modifications, such as introducing local biases and adopting richer hidden unit priors.
- Score: 4.499833362998488
- License: http://creativecommons.org/licenses/by-nc-nd/4.0/
- Abstract: We investigate the phase diagram and memory retrieval capabilities of bipartite energy-based neural networks, namely Restricted Boltzmann Machines (RBMs), as a function of the prior distribution imposed on their hidden units - including binary, multi-state, and ReLU-like activations. Drawing connections to the Hopfield model and employing analytical tools from statistical physics of disordered systems, we explore how the architectural choices and activation functions shape the thermodynamic properties of these models. Our analysis reveals that standard RBMs with binary hidden nodes and extensive connectivity suffer from reduced critical capacity, limiting their effectiveness as associative memories. To address this, we examine several modifications, such as introducing local biases and adopting richer hidden unit priors. These adjustments restore ordered retrieval phases and markedly improve recall performance, even at finite temperatures. Our theoretical findings, supported by finite-size Monte Carlo simulations, highlight the importance of hidden unit design in enhancing the expressive power of RBMs.
Related papers
- Models of Heavy-Tailed Mechanistic Universality [62.107333654304014]
We propose a family of random matrix models to explore attributes that give rise to heavy-tailed behavior in trained neural networks.<n>Under this model, spectral densities with power laws on tails arise through a combination of three independent factors.<n> Implications of our model on other appearances of heavy tails, including neural scaling laws, trajectories, and the five-plus-one phases of neural network training, are discussed.
arXiv Detail & Related papers (2025-06-04T00:55:01Z) - Explosive neural networks via higher-order interactions in curved statistical manifolds [43.496401697112695]
We introduce curved neural networks as a class of prototypical models with a limited number of parameters.<n>We show that these curved neural networks implement a self-regulating process that can accelerate memory retrieval.<n>We analytically explore their memory-retrieval capacity using the replica trick near ferromagnetic and spin-glass phase boundaries.
arXiv Detail & Related papers (2024-08-05T09:10:29Z) - Neuromorphic Circuit Simulation with Memristors: Design and Evaluation Using MemTorch for MNIST and CIFAR [0.4077787659104315]
This study evaluates the feasibility of using memristors for in-memory processing by constructing and training three digital convolutional neural networks.
Conversion of these networks into memristive systems was performed using Memtorch.
The simulations, conducted under ideal conditions, revealed minimal precision losses of nearly 1% during inference.
arXiv Detail & Related papers (2024-07-18T11:30:33Z) - Solution space and storage capacity of fully connected two-layer neural networks with generic activation functions [0.552480439325792]
storage capacity of a binary classification model is the maximum number of random input-output pairs per parameter that the model can learn.<n>We analyze the structure of the solution space and the storage capacity of fully connected two-layer neural networks with general activation functions.
arXiv Detail & Related papers (2024-04-20T15:12:47Z) - Spatial Attention-based Distribution Integration Network for Human Pose
Estimation [0.8052382324386398]
We present the Spatial Attention-based Distribution Integration Network (SADI-NET) to improve the accuracy of localization.
Our network consists of three efficient models: the receptive fortified module (RFM), spatial fusion module (SFM), and distribution learning module (DLM)
Our model obtained a remarkable $92.10%$ percent accuracy on the MPII test dataset, demonstrating significant improvements over existing models and establishing state-of-the-art performance.
arXiv Detail & Related papers (2023-11-09T12:43:01Z) - Leveraging Low-Rank and Sparse Recurrent Connectivity for Robust
Closed-Loop Control [63.310780486820796]
We show how a parameterization of recurrent connectivity influences robustness in closed-loop settings.
We find that closed-form continuous-time neural networks (CfCs) with fewer parameters can outperform their full-rank, fully-connected counterparts.
arXiv Detail & Related papers (2023-10-05T21:44:18Z) - Gibbs-Duhem-Informed Neural Networks for Binary Activity Coefficient
Prediction [45.84205238554709]
We propose Gibbs-Duhem-informed neural networks for the prediction of binary activity coefficients at varying compositions.
We include the Gibbs-Duhem equation explicitly in the loss function for training neural networks.
arXiv Detail & Related papers (2023-05-31T07:36:45Z) - Cross-Frequency Coupling Increases Memory Capacity in Oscillatory Neural
Networks [69.42260428921436]
Cross-frequency coupling (CFC) is associated with information integration across populations of neurons.
We construct a model of CFC which predicts a computational role for observed $theta - gamma$ oscillatory circuits in the hippocampus and cortex.
We show that the presence of CFC increases the memory capacity of a population of neurons connected by plastic synapses.
arXiv Detail & Related papers (2022-04-05T17:13:36Z) - On Energy-Based Models with Overparametrized Shallow Neural Networks [44.74000986284978]
Energy-based models (EBMs) are a powerful framework for generative modeling.
In this work we focus on shallow neural networks.
We show that models trained in the so-called "active" regime provide a statistical advantage over their associated "lazy" or kernel regime.
arXiv Detail & Related papers (2021-04-15T15:34:58Z) - Lipschitz Recurrent Neural Networks [100.72827570987992]
We show that our Lipschitz recurrent unit is more robust with respect to input and parameter perturbations as compared to other continuous-time RNNs.
Our experiments demonstrate that the Lipschitz RNN can outperform existing recurrent units on a range of benchmark tasks.
arXiv Detail & Related papers (2020-06-22T08:44:52Z) - Hyperbolic Neural Networks++ [66.16106727715061]
We generalize the fundamental components of neural networks in a single hyperbolic geometry model, namely, the Poincar'e ball model.
Experiments show the superior parameter efficiency of our methods compared to conventional hyperbolic components, and stability and outperformance over their Euclidean counterparts.
arXiv Detail & Related papers (2020-06-15T08:23:20Z) - Energy-Based Processes for Exchangeable Data [109.04978766553612]
We introduce Energy-Based Processes (EBPs) to extend energy based models to exchangeable data.
A key advantage of EBPs is the ability to express more flexible distributions over sets without restricting their cardinality.
We develop an efficient training procedure for EBPs that demonstrates state-of-the-art performance on a variety of tasks.
arXiv Detail & Related papers (2020-03-17T04:26:02Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.