Towards quantifying information flows: relative entropy in deep neural
networks and the renormalization group
- URL: http://arxiv.org/abs/2107.06898v1
- Date: Wed, 14 Jul 2021 18:00:01 GMT
- Title: Towards quantifying information flows: relative entropy in deep neural
networks and the renormalization group
- Authors: Johanna Erdmenger, Kevin T. Grosvenor, and Ro Jefferson
- Abstract summary: We quantify the flow of information by explicitly computing the relative entropy or Kullback-Leibler divergence.
For the neural networks, the behavior may have implications for various information methods in machine learning.
- Score: 0.0
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: We investigate the analogy between the renormalization group (RG) and deep
neural networks, wherein subsequent layers of neurons are analogous to
successive steps along the RG. In particular, we quantify the flow of
information by explicitly computing the relative entropy or Kullback-Leibler
divergence in both the one- and two-dimensional Ising models under decimation
RG, as well as in a feedforward neural network as a function of depth. We
observe qualitatively identical behavior characterized by the monotonic
increase to a parameter-dependent asymptotic value. On the quantum field theory
side, the monotonic increase confirms the connection between the relative
entropy and the c-theorem. For the neural networks, the asymptotic behavior may
have implications for various information maximization methods in machine
learning, as well as for disentangling compactness and generalizability.
Furthermore, while both the two-dimensional Ising model and the random neural
networks we consider exhibit non-trivial critical points, the relative entropy
appears insensitive to the phase structure of either system. In this sense,
more refined probes are required in order to fully elucidate the flow of
information in these models.
Related papers
- Stochastic Gradient Descent for Two-layer Neural Networks [2.0349026069285423]
This paper presents a study on the convergence rates of the descent (SGD) algorithm when applied to overparameterized two-layer neural networks.
Our approach combines the Tangent Kernel (NTK) approximation with convergence analysis in the Reproducing Kernel Space (RKHS) generated by NTK.
Our research framework enables us to explore the intricate interplay between kernel methods and optimization processes, shedding light on the dynamics and convergence properties of neural networks.
arXiv Detail & Related papers (2024-07-10T13:58:57Z) - Fluctuation based interpretable analysis scheme for quantum many-body
snapshots [0.0]
Microscopically understanding and classifying phases of matter is at the heart of strongly-correlated quantum physics.
Here, we combine confusion learning with correlation convolutional neural networks, which yields fully interpretable phase detection.
Our work opens new directions in interpretable quantum image processing being sensible to long-range order.
arXiv Detail & Related papers (2023-04-12T17:59:59Z) - On the Approximation and Complexity of Deep Neural Networks to Invariant
Functions [0.0]
We study the approximation and complexity of deep neural networks to invariant functions.
We show that a broad range of invariant functions can be approximated by various types of neural network models.
We provide a feasible application that connects the parameter estimation and forecasting of high-resolution signals with our theoretical conclusions.
arXiv Detail & Related papers (2022-10-27T09:19:19Z) - Interrelation of equivariant Gaussian processes and convolutional neural
networks [77.34726150561087]
Currently there exists rather promising new trend in machine leaning (ML) based on the relationship between neural networks (NN) and Gaussian processes (GP)
In this work we establish a relationship between the many-channel limit for CNNs equivariant with respect to two-dimensional Euclidean group with vector-valued neuron activations and the corresponding independently introduced equivariant Gaussian processes (GP)
arXiv Detail & Related papers (2022-09-17T17:02:35Z) - Deep Architecture Connectivity Matters for Its Convergence: A
Fine-Grained Analysis [94.64007376939735]
We theoretically characterize the impact of connectivity patterns on the convergence of deep neural networks (DNNs) under gradient descent training.
We show that by a simple filtration on "unpromising" connectivity patterns, we can trim down the number of models to evaluate.
arXiv Detail & Related papers (2022-05-11T17:43:54Z) - Cross-Frequency Coupling Increases Memory Capacity in Oscillatory Neural
Networks [69.42260428921436]
Cross-frequency coupling (CFC) is associated with information integration across populations of neurons.
We construct a model of CFC which predicts a computational role for observed $theta - gamma$ oscillatory circuits in the hippocampus and cortex.
We show that the presence of CFC increases the memory capacity of a population of neurons connected by plastic synapses.
arXiv Detail & Related papers (2022-04-05T17:13:36Z) - Data-driven emergence of convolutional structure in neural networks [83.4920717252233]
We show how fully-connected neural networks solving a discrimination task can learn a convolutional structure directly from their inputs.
By carefully designing data models, we show that the emergence of this pattern is triggered by the non-Gaussian, higher-order local structure of the inputs.
arXiv Detail & Related papers (2022-02-01T17:11:13Z) - Double-descent curves in neural networks: a new perspective using
Gaussian processes [9.153116600213641]
Double-descent curves in neural networks describe the phenomenon that the generalisation error initially descends with increasing parameters, then grows after reaching an optimal number of parameters.
We use techniques from random matrix theory to characterize the spectral distribution of the empirical feature covariance matrix as a width-dependent of the spectrum of the neural network Gaussian process kernel.
arXiv Detail & Related papers (2021-02-14T20:31:49Z) - Topological obstructions in neural networks learning [67.8848058842671]
We study global properties of the loss gradient function flow.
We use topological data analysis of the loss function and its Morse complex to relate local behavior along gradient trajectories with global properties of the loss surface.
arXiv Detail & Related papers (2020-12-31T18:53:25Z) - Depth-Width Trade-offs for Neural Networks via Topological Entropy [0.0]
We show a new connection between the expressivity of deep neural networks and topological entropy from dynamical system.
We discuss the relationship between topological entropy, the number of oscillations, periods and Lipschitz constant.
arXiv Detail & Related papers (2020-10-15T08:14:44Z) - Provably Efficient Neural Estimation of Structural Equation Model: An
Adversarial Approach [144.21892195917758]
We study estimation in a class of generalized Structural equation models (SEMs)
We formulate the linear operator equation as a min-max game, where both players are parameterized by neural networks (NNs), and learn the parameters of these neural networks using a gradient descent.
For the first time we provide a tractable estimation procedure for SEMs based on NNs with provable convergence and without the need for sample splitting.
arXiv Detail & Related papers (2020-07-02T17:55:47Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.