Statistical Physics of Deep Neural Networks: Initialization toward
Optimal Channels
- URL: http://arxiv.org/abs/2212.01744v1
- Date: Sun, 4 Dec 2022 05:13:01 GMT
- Title: Statistical Physics of Deep Neural Networks: Initialization toward
Optimal Channels
- Authors: Kangyu Weng, Aohua Cheng, Ziyang Zhang, Pei Sun, Yang Tian
- Abstract summary: In deep learning, neural networks serve as noisy channels between input data and its representation.
We study a frequently overlooked possibility that neural networks can be intrinsic toward optimal channels.
- Score: 6.144858413112823
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: In deep learning, neural networks serve as noisy channels between input data
and its representation. This perspective naturally relates deep learning with
the pursuit of constructing channels with optimal performance in information
transmission and representation. While considerable efforts are concentrated on
realizing optimal channel properties during network optimization, we study a
frequently overlooked possibility that neural networks can be initialized
toward optimal channels. Our theory, consistent with experimental validation,
identifies primary mechanics underlying this unknown possibility and suggests
intrinsic connections between statistical physics and deep learning. Unlike the
conventional theories that characterize neural networks applying the classic
mean-filed approximation, we offer analytic proof that this extensively applied
simplification scheme is not valid in studying neural networks as information
channels. To fill this gap, we develop a corrected mean-field framework
applicable for characterizing the limiting behaviors of information propagation
in neural networks without strong assumptions on inputs. Based on it, we
propose an analytic theory to prove that mutual information maximization is
realized between inputs and propagated signals when neural networks are
initialized at dynamic isometry, a case where information transmits via
norm-preserving mappings. These theoretical predictions are validated by
experiments on real neural networks, suggesting the robustness of our theory
against finite-size effects. Finally, we analyze our findings with information
bottleneck theory to confirm the precise relations among dynamic isometry,
mutual information maximization, and optimal channel properties in deep
learning.
Related papers
- Interpreting Neural Networks through Mahalanobis Distance [0.0]
This paper introduces a theoretical framework that connects neural network linear layers with the Mahalanobis distance.
Although this work is theoretical and does not include empirical data, the proposed distance-based interpretation has the potential to enhance model robustness, improve generalization, and provide more intuitive explanations of neural network decisions.
arXiv Detail & Related papers (2024-10-25T07:21:44Z) - Deep Neural Networks Tend To Extrapolate Predictably [51.303814412294514]
neural network predictions tend to be unpredictable and overconfident when faced with out-of-distribution (OOD) inputs.
We observe that neural network predictions often tend towards a constant value as input data becomes increasingly OOD.
We show how one can leverage our insights in practice to enable risk-sensitive decision-making in the presence of OOD inputs.
arXiv Detail & Related papers (2023-10-02T03:25:32Z) - Data-driven emergence of convolutional structure in neural networks [83.4920717252233]
We show how fully-connected neural networks solving a discrimination task can learn a convolutional structure directly from their inputs.
By carefully designing data models, we show that the emergence of this pattern is triggered by the non-Gaussian, higher-order local structure of the inputs.
arXiv Detail & Related papers (2022-02-01T17:11:13Z) - The Principles of Deep Learning Theory [19.33681537640272]
This book develops an effective theory approach to understanding deep neural networks of practical relevance.
We explain how these effectively-deep networks learn nontrivial representations from training.
We show that the depth-to-width ratio governs the effective model complexity of the ensemble of trained networks.
arXiv Detail & Related papers (2021-06-18T15:00:00Z) - Credit Assignment in Neural Networks through Deep Feedback Control [59.14935871979047]
Deep Feedback Control (DFC) is a new learning method that uses a feedback controller to drive a deep neural network to match a desired output target and whose control signal can be used for credit assignment.
The resulting learning rule is fully local in space and time and approximates Gauss-Newton optimization for a wide range of connectivity patterns.
To further underline its biological plausibility, we relate DFC to a multi-compartment model of cortical pyramidal neurons with a local voltage-dependent synaptic plasticity rule, consistent with recent theories of dendritic processing.
arXiv Detail & Related papers (2021-06-15T05:30:17Z) - Learning Structures for Deep Neural Networks [99.8331363309895]
We propose to adopt the efficient coding principle, rooted in information theory and developed in computational neuroscience.
We show that sparse coding can effectively maximize the entropy of the output signals.
Our experiments on a public image classification dataset demonstrate that using the structure learned from scratch by our proposed algorithm, one can achieve a classification accuracy comparable to the best expert-designed structure.
arXiv Detail & Related papers (2021-05-27T12:27:24Z) - Learning Connectivity of Neural Networks from a Topological Perspective [80.35103711638548]
We propose a topological perspective to represent a network into a complete graph for analysis.
By assigning learnable parameters to the edges which reflect the magnitude of connections, the learning process can be performed in a differentiable manner.
This learning process is compatible with existing networks and owns adaptability to larger search spaces and different tasks.
arXiv Detail & Related papers (2020-08-19T04:53:31Z) - Network Diffusions via Neural Mean-Field Dynamics [52.091487866968286]
We propose a novel learning framework for inference and estimation problems of diffusion on networks.
Our framework is derived from the Mori-Zwanzig formalism to obtain an exact evolution of the node infection probabilities.
Our approach is versatile and robust to variations of the underlying diffusion network models.
arXiv Detail & Related papers (2020-06-16T18:45:20Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.