Entropy, Free Energy, and Work of Restricted Boltzmann Machines
- URL: http://arxiv.org/abs/2004.04900v1
- Date: Fri, 10 Apr 2020 04:16:33 GMT
- Title: Entropy, Free Energy, and Work of Restricted Boltzmann Machines
- Authors: Sangchul Oh, Abdelkader Baggag, Hyunchul Nha
- Abstract summary: We analyze the training process of the restricted Boltzmann machine in the context of statistical physics.
We demonstrate the growth of the correlation between the visible and hidden layers via the subadditivity of entropies as the training proceeds.
We discuss the Jarzynski equality which connects the path average of the exponential function of the work and the difference in free energies before and after training.
- Score: 0.08594140167290096
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: A restricted Boltzmann machine is a generative probabilistic graphic network.
A probability of finding the network in a certain configuration is given by the
Boltzmann distribution. Given training data, its learning is done by optimizing
parameters of the energy function of the network. In this paper, we analyze the
training process of the restricted Boltzmann machine in the context of
statistical physics. As an illustration, for small size Bar-and-Stripe
patterns, we calculate thermodynamic quantities such as entropy, free energy,
and internal energy as a function of training epoch. We demonstrate the growth
of the correlation between the visible and hidden layers via the subadditivity
of entropies as the training proceeds. Using the Monte-Carlo simulation of
trajectories of the visible and hidden vectors in configuration space, we also
calculate the distribution of the work done on the restricted Boltzmann machine
by switching the parameters of the energy function. We discuss the Jarzynski
equality which connects the path average of the exponential function of the
work and the difference in free energies before and after training.
Related papers
- DimOL: Dimensional Awareness as A New 'Dimension' in Operator Learning [63.5925701087252]
We introduce DimOL (Dimension-aware Operator Learning), drawing insights from dimensional analysis.
To implement DimOL, we propose the ProdLayer, which can be seamlessly integrated into FNO-based and Transformer-based PDE solvers.
Empirically, DimOL models achieve up to 48% performance gain within the PDE datasets.
arXiv Detail & Related papers (2024-10-08T10:48:50Z) - Neural Entropy [0.0]
We examine the connection between deep learning and information theory through the paradigm of diffusion models.
We characterize the amount of information required to reverse a diffusive process and show that neural networks store this information and operate in a manner reminiscent of Maxwell's demon during the generative stage.
This conceptual picture blends elements of optimal control, thermodynamics, information theory, and optimal transport, and raises the prospect of applying diffusion models as a test bench to understand neural networks.
arXiv Detail & Related papers (2024-09-05T18:00:00Z) - Enhancing lattice kinetic schemes for fluid dynamics with Lattice-Equivariant Neural Networks [79.16635054977068]
We present a new class of equivariant neural networks, dubbed Lattice-Equivariant Neural Networks (LENNs)
Our approach develops within a recently introduced framework aimed at learning neural network-based surrogate models Lattice Boltzmann collision operators.
Our work opens towards practical utilization of machine learning-augmented Lattice Boltzmann CFD in real-world simulations.
arXiv Detail & Related papers (2024-05-22T17:23:15Z) - Trained quantum neural networks are Gaussian processes [5.439020425819001]
We study quantum neural networks made by parametric one-qubit gates and fixed two-qubit gates in the limit of width.
We analytically characterize the training of the network via gradient descent with square loss on supervised learning problems.
We prove that, as long as the network is not affected by barren plateaus, the trained network can perfectly fit the training set.
arXiv Detail & Related papers (2024-02-13T19:00:08Z) - Universal representation by Boltzmann machines with Regularised Axons [34.337412054122076]
We show that regularised Boltzmann machines preserve the ability to represent arbitrary distributions.
We also show that regularised Boltzmann machines can store exponentially many arbitrarily correlated visible patterns with perfect retrieval.
arXiv Detail & Related papers (2023-10-22T20:05:47Z) - Better Training of GFlowNets with Local Credit and Incomplete
Trajectories [81.14310509871935]
We consider the case where the energy function can be applied not just to terminal states but also to intermediate states.
This is for example achieved when the energy function is additive, with terms available along the trajectory.
This enables a training objective that can be applied to update parameters even with incomplete trajectories.
arXiv Detail & Related papers (2023-02-03T12:19:42Z) - GFlowNet Foundations [66.69854262276391]
Generative Flow Networks (GFlowNets) have been introduced as a method to sample a diverse set of candidates in an active learning context.
We show a number of additional theoretical properties of GFlowNets.
arXiv Detail & Related papers (2021-11-17T17:59:54Z) - Using Restricted Boltzmann Machines to Model Molecular Geometries [0.0]
This paper proposes a new methodology for modeling a set of physical parameters by taking advantage of the Boltzmann machine's fast learning capacity and representational power.
In this paper we introduce a new RBM based on the Tanh activation function, and conduct a comparison of RBMs with different activation functions.
We demonstrate the ability of Gaussian RBMs to model small molecules such as water and ethane.
arXiv Detail & Related papers (2020-12-13T07:02:32Z) - Boltzmann machine learning with a variational quantum algorithm [0.0]
Boltzmann machine is a powerful tool for modeling probability distributions that govern the training data.
We propose a method to implement the Boltzmann machine learning by using Noisy Intermediate-Scale Quantum (NISQ) devices.
arXiv Detail & Related papers (2020-07-02T04:45:02Z) - Applications of Koopman Mode Analysis to Neural Networks [52.77024349608834]
We consider the training process of a neural network as a dynamical system acting on the high-dimensional weight space.
We show how the Koopman spectrum can be used to determine the number of layers required for the architecture.
We also show how using Koopman modes we can selectively prune the network to speed up the training procedure.
arXiv Detail & Related papers (2020-06-21T11:00:04Z) - Targeted free energy estimation via learned mappings [66.20146549150475]
Free energy perturbation (FEP) was proposed by Zwanzig more than six decades ago as a method to estimate free energy differences.
FEP suffers from a severe limitation: the requirement of sufficient overlap between distributions.
One strategy to mitigate this problem, called Targeted Free Energy Perturbation, uses a high-dimensional mapping in configuration space to increase overlap.
arXiv Detail & Related papers (2020-02-12T11:10:00Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.