Neural Entropy
- URL: http://arxiv.org/abs/2409.03817v1
- Date: Thu, 5 Sep 2024 18:00:00 GMT
- Title: Neural Entropy
- Authors: Akhil Premkumar,
- Abstract summary: We examine the connection between deep learning and information theory through the paradigm of diffusion models.
We characterize the amount of information required to reverse a diffusive process and show that neural networks store this information and operate in a manner reminiscent of Maxwell's demon during the generative stage.
This conceptual picture blends elements of optimal control, thermodynamics, information theory, and optimal transport, and raises the prospect of applying diffusion models as a test bench to understand neural networks.
- Score: 0.0
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: We examine the connection between deep learning and information theory through the paradigm of diffusion models. Using well-established principles from non-equilibrium thermodynamics we can characterize the amount of information required to reverse a diffusive process. Neural networks store this information and operate in a manner reminiscent of Maxwell's demon during the generative stage. We illustrate this cycle using a novel diffusion scheme we call the entropy matching model, wherein the information conveyed to the network during training exactly corresponds to the entropy that must be negated during reversal. We demonstrate that this entropy can be used to analyze the encoding efficiency and storage capacity of the network. This conceptual picture blends elements of stochastic optimal control, thermodynamics, information theory, and optimal transport, and raises the prospect of applying diffusion models as a test bench to understand neural networks.
Related papers
- Resolving Memorization in Empirical Diffusion Model for Manifold Data in High-Dimensional Spaces [5.716752583983991]
When the data distribution consists of n points, empirical diffusion models tend to reproduce existing data points.<n>This work shows that the memorization issue can be solved simply by applying an inertia update at the end of the empirical diffusion simulation.<n>We demonstrate that the distribution of samples from this model approximates the true data distribution on a $C2$ manifold of dimension $d$, within a Wasserstein-1 distance of order $O(n-frac2d+4)$.
arXiv Detail & Related papers (2025-05-05T09:40:41Z) - Data augmentation using diffusion models to enhance inverse Ising inference [2.654300333196867]
We show that diffusion models can enhance parameter inference by augmenting small datasets.<n>This study serves as a proof-of-concept for using diffusion models for data augmentation in physics-related problems.
arXiv Detail & Related papers (2025-03-13T08:29:17Z) - Neural Message Passing Induced by Energy-Constrained Diffusion [79.9193447649011]
We propose an energy-constrained diffusion model as a principled interpretable framework for understanding the mechanism of MPNNs.
We show that the new model can yield promising performance for cases where the data structures are observed (as a graph), partially observed or completely unobserved.
arXiv Detail & Related papers (2024-09-13T17:54:41Z) - Speed-accuracy relations for the diffusion models: Wisdom from nonequilibrium thermodynamics and optimal transport [0.0]
We discuss a connection between a generative model, called the diffusion model, and nonequilibrium thermodynamics.
We numerically illustrate the validity of the speed-accuracy relations for the diffusion models with different noise schedules and the different data.
We also show the inaccurate data generation due to the non-conservative force, and the applicability of our results to data generation from the real-world image datasets.
arXiv Detail & Related papers (2024-07-05T13:35:14Z) - Predicting Cascading Failures with a Hyperparametric Diffusion Model [66.89499978864741]
We study cascading failures in power grids through the lens of diffusion models.
Our model integrates viral diffusion principles with physics-based concepts.
We show that this diffusion model can be learned from traces of cascading failures.
arXiv Detail & Related papers (2024-06-12T02:34:24Z) - Heat Death of Generative Models in Closed-Loop Learning [63.83608300361159]
We study the learning dynamics of generative models that are fed back their own produced content in addition to their original training dataset.
We show that, unless a sufficient amount of external data is introduced at each iteration, any non-trivial temperature leads the model to degenerate.
arXiv Detail & Related papers (2024-04-02T21:51:39Z) - Towards Theoretical Understandings of Self-Consuming Generative Models [56.84592466204185]
This paper tackles the emerging challenge of training generative models within a self-consuming loop.
We construct a theoretical framework to rigorously evaluate how this training procedure impacts the data distributions learned by future models.
We present results for kernel density estimation, delivering nuanced insights such as the impact of mixed data training on error propagation.
arXiv Detail & Related papers (2024-02-19T02:08:09Z) - Lecture Notes in Probabilistic Diffusion Models [0.5361320134021585]
Diffusion models are loosely modelled based on non-equilibrium thermodynamics.
The diffusion model learns the data manifold to which the original and thus the reconstructed data samples belong.
Diffusion models have -- unlike variational autoencoder and flow models -- latent variables with the same dimensionality as the original data.
arXiv Detail & Related papers (2023-12-16T09:36:54Z) - Assessing Neural Network Representations During Training Using
Noise-Resilient Diffusion Spectral Entropy [55.014926694758195]
Entropy and mutual information in neural networks provide rich information on the learning process.
We leverage data geometry to access the underlying manifold and reliably compute these information-theoretic measures.
We show that they form noise-resistant measures of intrinsic dimensionality and relationship strength in high-dimensional simulated data.
arXiv Detail & Related papers (2023-12-04T01:32:42Z) - Dynamic Tensor Decomposition via Neural Diffusion-Reaction Processes [24.723536390322582]
tensor decomposition is an important tool for multiway data analysis.
We propose Dynamic EMbedIngs fOr dynamic algorithm dEcomposition (DEMOTE)
We show the advantage of our approach in both simulation study and real-world applications.
arXiv Detail & Related papers (2023-10-30T15:49:45Z) - Information-Theoretic GAN Compression with Variational Energy-based
Model [36.77535324130402]
We propose an information-theoretic knowledge distillation approach for the compression of generative adversarial networks.
We show that the proposed algorithm achieves outstanding performance in model compression of generative adversarial networks consistently.
arXiv Detail & Related papers (2023-03-28T15:32:21Z) - Score Approximation, Estimation and Distribution Recovery of Diffusion
Models on Low-Dimensional Data [68.62134204367668]
This paper studies score approximation, estimation, and distribution recovery of diffusion models, when data are supported on an unknown low-dimensional linear subspace.
We show that with a properly chosen neural network architecture, the score function can be both accurately approximated and efficiently estimated.
The generated distribution based on the estimated score function captures the data geometric structures and converges to a close vicinity of the data distribution.
arXiv Detail & Related papers (2023-02-14T17:02:35Z) - Data-driven emergence of convolutional structure in neural networks [83.4920717252233]
We show how fully-connected neural networks solving a discrimination task can learn a convolutional structure directly from their inputs.
By carefully designing data models, we show that the emergence of this pattern is triggered by the non-Gaussian, higher-order local structure of the inputs.
arXiv Detail & Related papers (2022-02-01T17:11:13Z) - Towards quantifying information flows: relative entropy in deep neural
networks and the renormalization group [0.0]
We quantify the flow of information by explicitly computing the relative entropy or Kullback-Leibler divergence.
For the neural networks, the behavior may have implications for various information methods in machine learning.
arXiv Detail & Related papers (2021-07-14T18:00:01Z) - Physics perception in sloshing scenes with guaranteed thermodynamic
consistency [0.0]
We propose a strategy to learn the full state of sloshing liquids from measurements of the free surface.
Our approach is based on recurrent neural networks (RNN) that project the limited information available to a reduced-order manifold.
arXiv Detail & Related papers (2021-06-24T20:13:56Z) - Influence Estimation and Maximization via Neural Mean-Field Dynamics [60.91291234832546]
We propose a novel learning framework using neural mean-field (NMF) dynamics for inference and estimation problems.
Our framework can simultaneously learn the structure of the diffusion network and the evolution of node infection probabilities.
arXiv Detail & Related papers (2021-06-03T00:02:05Z) - Deep learning of thermodynamics-aware reduced-order models from data [0.08699280339422537]
We present an algorithm to learn the relevant latent variables of a large-scale discretized physical system.
We then predict its time evolution using thermodynamically-consistent deep neural networks.
arXiv Detail & Related papers (2020-07-03T08:49:01Z) - Network Diffusions via Neural Mean-Field Dynamics [52.091487866968286]
We propose a novel learning framework for inference and estimation problems of diffusion on networks.
Our framework is derived from the Mori-Zwanzig formalism to obtain an exact evolution of the node infection probabilities.
Our approach is versatile and robust to variations of the underlying diffusion network models.
arXiv Detail & Related papers (2020-06-16T18:45:20Z) - Focus of Attention Improves Information Transfer in Visual Features [80.22965663534556]
This paper focuses on unsupervised learning for transferring visual information in a truly online setting.
The computation of the entropy terms is carried out by a temporal process which yields online estimation of the entropy terms.
In order to better structure the input probability distribution, we use a human-like focus of attention model.
arXiv Detail & Related papers (2020-06-16T15:07:25Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.