A Chain Graph Interpretation of Real-World Neural Networks
- URL: http://arxiv.org/abs/2006.16856v2
- Date: Tue, 6 Oct 2020 11:14:19 GMT
- Title: A Chain Graph Interpretation of Real-World Neural Networks
- Authors: Yuesong Shen, Daniel Cremers
- Abstract summary: We propose an alternative interpretation that identifies NNs as chain graphs (CGs) and feed-forward as an approximate inference procedure.
The CG interpretation specifies the nature of each NN component within the rich theoretical framework of probabilistic graphical models.
We demonstrate with concrete examples that the CG interpretation can provide novel theoretical support and insights for various NN techniques.
- Score: 58.78692706974121
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: The last decade has witnessed a boom of deep learning research and
applications achieving state-of-the-art results in various domains. However,
most advances have been established empirically, and their theoretical analysis
remains lacking. One major issue is that our current interpretation of neural
networks (NNs) as function approximators is too generic to support in-depth
analysis. In this paper, we remedy this by proposing an alternative
interpretation that identifies NNs as chain graphs (CGs) and feed-forward as an
approximate inference procedure. The CG interpretation specifies the nature of
each NN component within the rich theoretical framework of probabilistic
graphical models, while at the same time remains general enough to cover
real-world NNs with arbitrary depth, multi-branching and varied activations, as
well as common structures including convolution / recurrent layers, residual
block and dropout. We demonstrate with concrete examples that the CG
interpretation can provide novel theoretical support and insights for various
NN techniques, as well as derive new deep learning approaches such as the
concept of partially collapsed feed-forward inference. It is thus a promising
framework that deepens our understanding of neural networks and provides a
coherent theoretical formulation for future deep learning research.
Related papers
- Understanding Deep Learning via Notions of Rank [5.439020425819001]
This thesis puts forth notions of rank as key for developing a theory of deep learning.
In particular, we establish that gradient-based training can induce an implicit regularization towards low rank for several neural network architectures.
Practical implications of our theory for designing explicit regularization schemes and data preprocessing algorithms are presented.
arXiv Detail & Related papers (2024-08-04T18:47:55Z) - Foundations and Frontiers of Graph Learning Theory [81.39078977407719]
Recent advancements in graph learning have revolutionized the way to understand and analyze data with complex structures.
Graph Neural Networks (GNNs), i.e. neural network architectures designed for learning graph representations, have become a popular paradigm.
This article provides a comprehensive summary of the theoretical foundations and breakthroughs concerning the approximation and learning behaviors intrinsic to prevalent graph learning models.
arXiv Detail & Related papers (2024-07-03T14:07:41Z) - Deep Neural Networks via Complex Network Theory: a Perspective [3.1023851130450684]
Deep Neural Networks (DNNs) can be represented as graphs whose links and vertices iteratively process data and solve tasks sub-optimally. Complex Network Theory (CNT), merging statistical physics with graph theory, provides a method for interpreting neural networks by analysing their weights and neuron structures.
In this work, we extend the existing CNT metrics with measures that sample from the DNNs' training distribution, shifting from a purely topological analysis to one that connects with the interpretability of deep learning.
arXiv Detail & Related papers (2024-04-17T08:42:42Z) - Understanding polysemanticity in neural networks through coding theory [0.8702432681310401]
We propose a novel practical approach to network interpretability and theoretical insights into polysemanticity and the density of codes.
We show how random projections can reveal whether a network exhibits a smooth or non-differentiable code and hence how interpretable the code is.
Our approach advances the pursuit of interpretability in neural networks, providing insights into their underlying structure and suggesting new avenues for circuit-level interpretability.
arXiv Detail & Related papers (2024-01-31T16:31:54Z) - Connecting NTK and NNGP: A Unified Theoretical Framework for Neural
Network Learning Dynamics in the Kernel Regime [7.136205674624813]
We provide a comprehensive framework for understanding the learning process of deep neural networks in the infinite width limit.
We identify two learning phases characterized by different time scales: gradient-driven and diffusive learning.
arXiv Detail & Related papers (2023-09-08T18:00:01Z) - The semantic landscape paradigm for neural networks [0.0]
We introduce the semantic landscape paradigm, a conceptual and mathematical framework that describes the training dynamics of neural networks.
Specifically, we show that grokking and emergence with scale are associated with percolation phenomena, and neural scaling laws are explainable in terms of the statistics of random walks on graphs.
arXiv Detail & Related papers (2023-07-18T18:48:54Z) - Rank Diminishing in Deep Neural Networks [71.03777954670323]
Rank of neural networks measures information flowing across layers.
It is an instance of a key structural condition that applies across broad domains of machine learning.
For neural networks, however, the intrinsic mechanism that yields low-rank structures remains vague and unclear.
arXiv Detail & Related papers (2022-06-13T12:03:32Z) - What can linearized neural networks actually say about generalization? [67.83999394554621]
In certain infinitely-wide neural networks, the neural tangent kernel (NTK) theory fully characterizes generalization.
We show that the linear approximations can indeed rank the learning complexity of certain tasks for neural networks.
Our work provides concrete examples of novel deep learning phenomena which can inspire future theoretical research.
arXiv Detail & Related papers (2021-06-12T13:05:11Z) - Developing Constrained Neural Units Over Time [81.19349325749037]
This paper focuses on an alternative way of defining Neural Networks, that is different from the majority of existing approaches.
The structure of the neural architecture is defined by means of a special class of constraints that are extended also to the interaction with data.
The proposed theory is cast into the time domain, in which data are presented to the network in an ordered manner.
arXiv Detail & Related papers (2020-09-01T09:07:25Z) - Generalization bound of globally optimal non-convex neural network
training: Transportation map estimation by infinite dimensional Langevin
dynamics [50.83356836818667]
We introduce a new theoretical framework to analyze deep learning optimization with connection to its generalization error.
Existing frameworks such as mean field theory and neural tangent kernel theory for neural network optimization analysis typically require taking limit of infinite width of the network to show its global convergence.
arXiv Detail & Related papers (2020-07-11T18:19:50Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.