The semantic landscape paradigm for neural networks
- URL: http://arxiv.org/abs/2307.09550v1
- Date: Tue, 18 Jul 2023 18:48:54 GMT
- Title: The semantic landscape paradigm for neural networks
- Authors: Shreyas Gokhale
- Abstract summary: We introduce the semantic landscape paradigm, a conceptual and mathematical framework that describes the training dynamics of neural networks.
Specifically, we show that grokking and emergence with scale are associated with percolation phenomena, and neural scaling laws are explainable in terms of the statistics of random walks on graphs.
- Score: 0.0
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Deep neural networks exhibit a fascinating spectrum of phenomena ranging from
predictable scaling laws to the unpredictable emergence of new capabilities as
a function of training time, dataset size and network size. Analysis of these
phenomena has revealed the existence of concepts and algorithms encoded within
the learned representations of these networks. While significant strides have
been made in explaining observed phenomena separately, a unified framework for
understanding, dissecting, and predicting the performance of neural networks is
lacking. Here, we introduce the semantic landscape paradigm, a conceptual and
mathematical framework that describes the training dynamics of neural networks
as trajectories on a graph whose nodes correspond to emergent algorithms that
are instrinsic to the learned representations of the networks. This abstraction
enables us to describe a wide range of neural network phenomena in terms of
well studied problems in statistical physics. Specifically, we show that
grokking and emergence with scale are associated with percolation phenomena,
and neural scaling laws are explainable in terms of the statistics of random
walks on graphs. Finally, we discuss how the semantic landscape paradigm
complements existing theoretical and practical approaches aimed at
understanding and interpreting deep neural networks.
Related papers
- Collective variables of neural networks: empirical time evolution and scaling laws [0.535514140374842]
We show that certain measures on the spectrum of the empirical neural tangent kernel, specifically entropy and trace, yield insight into the representations learned by a neural network.
Results are demonstrated first on test cases before being shown on more complex networks, including transformers, auto-encoders, graph neural networks, and reinforcement learning studies.
arXiv Detail & Related papers (2024-10-09T21:37:14Z) - Understanding Deep Learning via Notions of Rank [5.439020425819001]
This thesis puts forth notions of rank as key for developing a theory of deep learning.
In particular, we establish that gradient-based training can induce an implicit regularization towards low rank for several neural network architectures.
Practical implications of our theory for designing explicit regularization schemes and data preprocessing algorithms are presented.
arXiv Detail & Related papers (2024-08-04T18:47:55Z) - Coding schemes in neural networks learning classification tasks [52.22978725954347]
We investigate fully-connected, wide neural networks learning classification tasks.
We show that the networks acquire strong, data-dependent features.
Surprisingly, the nature of the internal representations depends crucially on the neuronal nonlinearity.
arXiv Detail & Related papers (2024-06-24T14:50:05Z) - Dynamical stability and chaos in artificial neural network trajectories along training [3.379574469735166]
We study the dynamical properties of this process by analyzing through this lens the network trajectories of a shallow neural network.
We find hints of regular and chaotic behavior depending on the learning rate regime.
This work also contributes to the cross-fertilization of ideas between dynamical systems theory, network theory and machine learning.
arXiv Detail & Related papers (2024-04-08T17:33:11Z) - Graph Neural Networks for Learning Equivariant Representations of Neural Networks [55.04145324152541]
We propose to represent neural networks as computational graphs of parameters.
Our approach enables a single model to encode neural computational graphs with diverse architectures.
We showcase the effectiveness of our method on a wide range of tasks, including classification and editing of implicit neural representations.
arXiv Detail & Related papers (2024-03-18T18:01:01Z) - Neuro-symbolic computing with spiking neural networks [0.6035125735474387]
We extend previous work on spike-based graph algorithms by demonstrating how symbolic and multi-relational information can be encoded using spiking neurons.
The introduced framework is enabled by combining the graph embedding paradigm and the recent progress in training spiking neural networks using error backpropagation.
arXiv Detail & Related papers (2022-08-04T10:49:34Z) - Data-driven emergence of convolutional structure in neural networks [83.4920717252233]
We show how fully-connected neural networks solving a discrimination task can learn a convolutional structure directly from their inputs.
By carefully designing data models, we show that the emergence of this pattern is triggered by the non-Gaussian, higher-order local structure of the inputs.
arXiv Detail & Related papers (2022-02-01T17:11:13Z) - What can linearized neural networks actually say about generalization? [67.83999394554621]
In certain infinitely-wide neural networks, the neural tangent kernel (NTK) theory fully characterizes generalization.
We show that the linear approximations can indeed rank the learning complexity of certain tasks for neural networks.
Our work provides concrete examples of novel deep learning phenomena which can inspire future theoretical research.
arXiv Detail & Related papers (2021-06-12T13:05:11Z) - Graph Structure of Neural Networks [104.33754950606298]
We show how the graph structure of neural networks affect their predictive performance.
A "sweet spot" of relational graphs leads to neural networks with significantly improved predictive performance.
Top-performing neural networks have graph structure surprisingly similar to those of real biological neural networks.
arXiv Detail & Related papers (2020-07-13T17:59:31Z) - A Chain Graph Interpretation of Real-World Neural Networks [58.78692706974121]
We propose an alternative interpretation that identifies NNs as chain graphs (CGs) and feed-forward as an approximate inference procedure.
The CG interpretation specifies the nature of each NN component within the rich theoretical framework of probabilistic graphical models.
We demonstrate with concrete examples that the CG interpretation can provide novel theoretical support and insights for various NN techniques.
arXiv Detail & Related papers (2020-06-30T14:46:08Z) - Complexity for deep neural networks and other characteristics of deep
feature representations [0.0]
We define a notion of complexity, which quantifies the nonlinearity of the computation of a neural network.
We investigate these observables both for trained networks as well as explore their dynamics during training.
arXiv Detail & Related papers (2020-06-08T17:59:30Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.