Maximum Multiscale Entropy and Neural Network Regularization
- URL: http://arxiv.org/abs/2006.14614v1
- Date: Thu, 25 Jun 2020 17:56:11 GMT
- Title: Maximum Multiscale Entropy and Neural Network Regularization
- Authors: Amir R. Asadi, Emmanuel Abbe
- Abstract summary: A well-known result shows that the maximum entropy distribution under a mean constraint has an exponential form called the Gibbs-Boltzmann distribution.
This paper investigates a generalization of these results to a multiscale setting.
- Score: 28.00914218615924
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: A well-known result across information theory, machine learning, and
statistical physics shows that the maximum entropy distribution under a mean
constraint has an exponential form called the Gibbs-Boltzmann distribution.
This is used for instance in density estimation or to achieve excess risk
bounds derived from single-scale entropy regularizers (Xu-Raginsky '17). This
paper investigates a generalization of these results to a multiscale setting.
We present different ways of generalizing the maximum entropy result by
incorporating the notion of scale. For different entropies and arbitrary scale
transformations, it is shown that the distribution maximizing a multiscale
entropy is characterized by a procedure which has an analogy to the
renormalization group procedure in statistical physics. For the case of
decimation transformation, it is further shown that this distribution is
Gaussian whenever the optimal single-scale distribution is Gaussian. This is
then applied to neural networks, and it is shown that in a teacher-student
scenario, the multiscale Gibbs posterior can achieve a smaller excess risk than
the single-scale Gibbs posterior.
Related papers
- Generalization of Geometric Graph Neural Networks [84.01980526069075]
We study the generalization capabilities of geometric graph neural networks (GNNs)
We prove a generalization gap between the optimal empirical risk and the optimal statistical risk of this GNN.
The most important observation is that the generalization capability can be realized with one large graph instead of being limited to the size of the graph as in previous results.
arXiv Detail & Related papers (2024-09-08T18:55:57Z) - Non-asymptotic bounds for forward processes in denoising diffusions: Ornstein-Uhlenbeck is hard to beat [49.1574468325115]
This paper presents explicit non-asymptotic bounds on the forward diffusion error in total variation (TV)
We parametrise multi-modal data distributions in terms of the distance $R$ to their furthest modes and consider forward diffusions with additive and multiplicative noise.
arXiv Detail & Related papers (2024-08-25T10:28:31Z) - Scaling and renormalization in high-dimensional regression [72.59731158970894]
This paper presents a succinct derivation of the training and generalization performance of a variety of high-dimensional ridge regression models.
We provide an introduction and review of recent results on these topics, aimed at readers with backgrounds in physics and deep learning.
arXiv Detail & Related papers (2024-05-01T15:59:00Z) - On the generic increase of observational entropy in isolated systems [6.874745415692133]
We show how observational entropy of a system undergoing a unitary evolution chosen at random tends to increase with overwhelming probability.
We show that for any observation that is sufficiently coarse with respect to the size of the system, regardless of the initial state of the system, random evolution renders its state practically indistinguishable from the microcanonical distribution.
arXiv Detail & Related papers (2024-04-18T08:27:04Z) - Universal distributions of overlaps from unitary dynamics in generic quantum many-body systems [0.0]
We study the preparation of a quantum state using a circuit of depth $t$ from a factorized state of $N$ sites.
We argue that in the appropriate scaling limit of large $t$ and $N$, the overlap between states evolved under generic many-body chaotic dynamics.
arXiv Detail & Related papers (2024-04-15T18:01:13Z) - Maximum Weight Entropy [6.821961232645206]
This paper deals with uncertainty quantification and out-of-distribution detection in deep learning using Bayesian and ensemble methods.
Considering neural networks, a practical optimization is derived to build such a distribution, defined as a trade-off between the average empirical risk and the weight distribution entropy.
arXiv Detail & Related papers (2023-09-27T14:46:10Z) - Bayesian Renormalization [68.8204255655161]
We present a fully information theoretic approach to renormalization inspired by Bayesian statistical inference.
The main insight of Bayesian Renormalization is that the Fisher metric defines a correlation length that plays the role of an emergent RG scale.
We provide insight into how the Bayesian Renormalization scheme relates to existing methods for data compression and data generation.
arXiv Detail & Related papers (2023-05-17T18:00:28Z) - High-dimensional limit theorems for SGD: Effective dynamics and critical
scaling [6.950316788263433]
We prove limit theorems for the trajectories of summary statistics of gradient descent (SGD)
We show a critical scaling regime for the step-size, below which the effective ballistic dynamics matches gradient flow for the population loss.
About the fixed points of this effective dynamics, the corresponding diffusive limits can be quite complex and even degenerate.
arXiv Detail & Related papers (2022-06-08T17:42:18Z) - Theoretical Error Analysis of Entropy Approximation for Gaussian Mixture [0.7499722271664147]
In this paper, we analyze the approximation error between the true entropy and the approximate one to reveal when this approximation works effectively.
Our results provide a guarantee that this approximation works well in higher dimension problems.
arXiv Detail & Related papers (2022-02-26T04:49:01Z) - Spectral clustering under degree heterogeneity: a case for the random
walk Laplacian [83.79286663107845]
This paper shows that graph spectral embedding using the random walk Laplacian produces vector representations which are completely corrected for node degree.
In the special case of a degree-corrected block model, the embedding concentrates about K distinct points, representing communities.
arXiv Detail & Related papers (2021-05-03T16:36:27Z) - GANs with Variational Entropy Regularizers: Applications in Mitigating
the Mode-Collapse Issue [95.23775347605923]
Building on the success of deep learning, Generative Adversarial Networks (GANs) provide a modern approach to learn a probability distribution from observed samples.
GANs often suffer from the mode collapse issue where the generator fails to capture all existing modes of the input distribution.
We take an information-theoretic approach and maximize a variational lower bound on the entropy of the generated samples to increase their diversity.
arXiv Detail & Related papers (2020-09-24T19:34:37Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.