Information Flow in Deep Neural Networks
- URL: http://arxiv.org/abs/2202.06749v1
- Date: Thu, 10 Feb 2022 23:32:26 GMT
- Title: Information Flow in Deep Neural Networks
- Authors: Ravid Shwartz-Ziv
- Abstract summary: There is no comprehensive theoretical understanding of how deep neural networks work or are structured.
Deep networks are often seen as black boxes with unclear interpretations and reliability.
This work aims to apply principles and techniques from information theory to deep learning models to increase our theoretical understanding and design better algorithms.
- Score: 0.6922389632860545
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Although deep neural networks have been immensely successful, there is no
comprehensive theoretical understanding of how they work or are structured. As
a result, deep networks are often seen as black boxes with unclear
interpretations and reliability. Understanding the performance of deep neural
networks is one of the greatest scientific challenges. This work aims to apply
principles and techniques from information theory to deep learning models to
increase our theoretical understanding and design better algorithms. We first
describe our information-theoretic approach to deep learning. Then, we propose
using the Information Bottleneck (IB) theory to explain deep learning systems.
The novel paradigm for analyzing networks sheds light on their layered
structure, generalization abilities, and learning dynamics. We later discuss
one of the most challenging problems of applying the IB to deep neural networks
- estimating mutual information. Recent theoretical developments, such as the
neural tangent kernel (NTK) framework, are used to investigate generalization
signals. In our study, we obtained tractable computations of many
information-theoretic quantities and their bounds for infinite ensembles of
infinitely wide neural networks. With these derivations, we can determine how
compression, generalization, and sample size pertain to the network and how
they are related. At the end, we present the dual Information Bottleneck
(dualIB). This new information-theoretic framework resolves some of the IB's
shortcomings by merely switching terms in the distortion function. The dualIB
can account for known data features and use them to make better predictions
over unseen examples. The underlying structure and the optimal representations
are uncovered through an analytical framework, and a variational framework
using deep neural networks optimizes has been used.
Related papers
- Statistical Physics of Deep Neural Networks: Initialization toward
Optimal Channels [6.144858413112823]
In deep learning, neural networks serve as noisy channels between input data and its representation.
We study a frequently overlooked possibility that neural networks can be intrinsic toward optimal channels.
arXiv Detail & Related papers (2022-12-04T05:13:01Z) - An Information-Theoretic Framework for Supervised Learning [22.280001450122175]
We propose a novel information-theoretic framework with its own notions of regret and sample complexity.
We study the sample complexity of learning from data generated by deep neural networks with ReLU activation units.
We conclude by corroborating our theoretical results with experimental analysis of random single-hidden-layer neural networks.
arXiv Detail & Related papers (2022-03-01T05:58:28Z) - The Principles of Deep Learning Theory [19.33681537640272]
This book develops an effective theory approach to understanding deep neural networks of practical relevance.
We explain how these effectively-deep networks learn nontrivial representations from training.
We show that the depth-to-width ratio governs the effective model complexity of the ensemble of trained networks.
arXiv Detail & Related papers (2021-06-18T15:00:00Z) - Credit Assignment in Neural Networks through Deep Feedback Control [59.14935871979047]
Deep Feedback Control (DFC) is a new learning method that uses a feedback controller to drive a deep neural network to match a desired output target and whose control signal can be used for credit assignment.
The resulting learning rule is fully local in space and time and approximates Gauss-Newton optimization for a wide range of connectivity patterns.
To further underline its biological plausibility, we relate DFC to a multi-compartment model of cortical pyramidal neurons with a local voltage-dependent synaptic plasticity rule, consistent with recent theories of dendritic processing.
arXiv Detail & Related papers (2021-06-15T05:30:17Z) - What can linearized neural networks actually say about generalization? [67.83999394554621]
In certain infinitely-wide neural networks, the neural tangent kernel (NTK) theory fully characterizes generalization.
We show that the linear approximations can indeed rank the learning complexity of certain tasks for neural networks.
Our work provides concrete examples of novel deep learning phenomena which can inspire future theoretical research.
arXiv Detail & Related papers (2021-06-12T13:05:11Z) - Learning Structures for Deep Neural Networks [99.8331363309895]
We propose to adopt the efficient coding principle, rooted in information theory and developed in computational neuroscience.
We show that sparse coding can effectively maximize the entropy of the output signals.
Our experiments on a public image classification dataset demonstrate that using the structure learned from scratch by our proposed algorithm, one can achieve a classification accuracy comparable to the best expert-designed structure.
arXiv Detail & Related papers (2021-05-27T12:27:24Z) - A neural anisotropic view of underspecification in deep learning [60.119023683371736]
We show that the way neural networks handle the underspecification of problems is highly dependent on the data representation.
Our results highlight that understanding the architectural inductive bias in deep learning is fundamental to address the fairness, robustness, and generalization of these systems.
arXiv Detail & Related papers (2021-04-29T14:31:09Z) - Learning Connectivity of Neural Networks from a Topological Perspective [80.35103711638548]
We propose a topological perspective to represent a network into a complete graph for analysis.
By assigning learnable parameters to the edges which reflect the magnitude of connections, the learning process can be performed in a differentiable manner.
This learning process is compatible with existing networks and owns adaptability to larger search spaces and different tasks.
arXiv Detail & Related papers (2020-08-19T04:53:31Z) - Understanding Generalization in Deep Learning via Tensor Methods [53.808840694241]
We advance the understanding of the relations between the network's architecture and its generalizability from the compression perspective.
We propose a series of intuitive, data-dependent and easily-measurable properties that tightly characterize the compressibility and generalizability of neural networks.
arXiv Detail & Related papers (2020-01-14T22:26:57Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.