Related papers: Collective variables of neural networks: empirical time evolution and scaling laws

Collective variables of neural networks: empirical time evolution and scaling laws

URL: http://arxiv.org/abs/2410.07451v1
Date: Wed, 9 Oct 2024 21:37:14 GMT
Title: Collective variables of neural networks: empirical time evolution and scaling laws
Authors: Samuel Tovey, Sven Krippendorf, Michael Spannowsky, Konstantin Nikolaou, Christian Holm,
Abstract summary: We show that certain measures on the spectrum of the empirical neural tangent kernel, specifically entropy and trace, yield insight into the representations learned by a neural network. Results are demonstrated first on test cases before being shown on more complex networks, including transformers, auto-encoders, graph neural networks, and reinforcement learning studies.
Score: 0.535514140374842
License: http://creativecommons.org/licenses/by/4.0/
Abstract: This work presents a novel means for understanding learning dynamics and scaling relations in neural networks. We show that certain measures on the spectrum of the empirical neural tangent kernel, specifically entropy and trace, yield insight into the representations learned by a neural network and how these can be improved through architecture scaling. These results are demonstrated first on test cases before being shown on more complex networks, including transformers, auto-encoders, graph neural networks, and reinforcement learning studies. In testing on a wide range of architectures, we highlight the universal nature of training dynamics and further discuss how it can be used to understand the mechanisms behind learning in neural networks. We identify two such dominant mechanisms present throughout machine learning training. The first, information compression, is seen through a reduction in the entropy of the NTK spectrum during training, and occurs predominantly in small neural networks. The second, coined structure formation, is seen through an increasing entropy and thus, the creation of structure in the neural network representations beyond the prior established by the network at initialization. Due to the ubiquity of the latter in deep neural network architectures and its flexibility in the creation of feature-rich representations, we argue that this form of evolution of the network's entropy be considered the onset of a deep learning regime.

Related papers

High-entropy Advantage in Neural Networks' Generalizability [7.193952396909214]
One of the central challenges in modern machine learning is understanding how neural networks generalize knowledge learned from training data to unseen test data. Here we introduce the concept of Boltzmann entropy into neural networks as hypothetical molecular systems where weights and biases are atomic coordinates, and the loss function is the potential energy. By employing molecular simulation algorithms, we compute entropy landscapes as functions of both training loss and test accuracy (or test loss) on networks with up to 1 million parameters, across four distinct machine learning tasks.
arXiv Detail & Related papers (2025-03-17T13:16:25Z)
Discovering Chunks in Neural Embeddings for Interpretability [53.80157905839065]
We propose leveraging the principle of chunking to interpret artificial neural population activities. We first demonstrate this concept in recurrent neural networks (RNNs) trained on artificial sequences with imposed regularities. We identify similar recurring embedding states corresponding to concepts in the input, with perturbations to these states activating or inhibiting the associated concepts.
arXiv Detail & Related papers (2025-02-03T20:30:46Z)
Emergent weight morphologies in deep neural networks [0.0]
We show that training deep neural networks gives rise to emergent weight morphologies independent of the training data. Our work demonstrates emergence in the training of deep neural networks, which impacts the achievable performance of deep neural networks.
arXiv Detail & Related papers (2025-01-09T19:48:51Z)
Formation of Representations in Neural Networks [8.79431718760617]
How complex, structured, and transferable representations emerge in modern neural networks has remained a mystery. We propose the Canonical Representation Hypothesis (CRH), which posits a set of six alignment relations to universally govern the formation of representations. We show that the breaking of CRH leads to the emergence of reciprocal power-law relations between R, W, and G, which we refer to as the Polynomial Alignment Hypothesis (PAH)
arXiv Detail & Related papers (2024-10-03T21:31:01Z)
Dynamic neurons: A statistical physics approach for analyzing deep neural networks [1.9662978733004601]
We treat neurons as additional degrees of freedom in interactions, simplifying the structure of deep neural networks. By utilizing translational symmetry and renormalization group transformations, we can analyze critical phenomena. This approach may open new avenues for studying deep neural networks using statistical physics.
arXiv Detail & Related papers (2024-10-01T04:39:04Z)
Graph Neural Networks for Learning Equivariant Representations of Neural Networks [55.04145324152541]
We propose to represent neural networks as computational graphs of parameters. Our approach enables a single model to encode neural computational graphs with diverse architectures. We showcase the effectiveness of our method on a wide range of tasks, including classification and editing of implicit neural representations.
arXiv Detail & Related papers (2024-03-18T18:01:01Z)
The semantic landscape paradigm for neural networks [0.0]
We introduce the semantic landscape paradigm, a conceptual and mathematical framework that describes the training dynamics of neural networks. Specifically, we show that grokking and emergence with scale are associated with percolation phenomena, and neural scaling laws are explainable in terms of the statistics of random walks on graphs.
arXiv Detail & Related papers (2023-07-18T18:48:54Z)
Spiking neural network for nonlinear regression [68.8204255655161]
Spiking neural networks carry the potential for a massive reduction in memory and energy consumption. They introduce temporal and neuronal sparsity, which can be exploited by next-generation neuromorphic hardware. A framework for regression using spiking neural networks is proposed.
arXiv Detail & Related papers (2022-10-06T13:04:45Z)
Data-driven emergence of convolutional structure in neural networks [83.4920717252233]
We show how fully-connected neural networks solving a discrimination task can learn a convolutional structure directly from their inputs. By carefully designing data models, we show that the emergence of this pattern is triggered by the non-Gaussian, higher-order local structure of the inputs.
arXiv Detail & Related papers (2022-02-01T17:11:13Z)
What can linearized neural networks actually say about generalization? [67.83999394554621]
In certain infinitely-wide neural networks, the neural tangent kernel (NTK) theory fully characterizes generalization. We show that the linear approximations can indeed rank the learning complexity of certain tasks for neural networks. Our work provides concrete examples of novel deep learning phenomena which can inspire future theoretical research.
arXiv Detail & Related papers (2021-06-12T13:05:11Z)
A neural anisotropic view of underspecification in deep learning [60.119023683371736]
We show that the way neural networks handle the underspecification of problems is highly dependent on the data representation. Our results highlight that understanding the architectural inductive bias in deep learning is fundamental to address the fairness, robustness, and generalization of these systems.
arXiv Detail & Related papers (2021-04-29T14:31:09Z)
Explainable artificial intelligence for mechanics: physics-informing neural networks for constitutive models [0.0]
In mechanics, the new and active field of physics-informed neural networks attempts to mitigate this disadvantage by designing deep neural networks on the basis of mechanical knowledge. We propose a first step towards a physics-forming-in approach, which explains neural networks trained on mechanical data a posteriori. Therein, the principal component analysis decorrelates the distributed representations in cell states of RNNs and allows the comparison to known and fundamental functions.
arXiv Detail & Related papers (2021-04-20T18:38:52Z)
Graph Structure of Neural Networks [104.33754950606298]
We show how the graph structure of neural networks affect their predictive performance. A "sweet spot" of relational graphs leads to neural networks with significantly improved predictive performance. Top-performing neural networks have graph structure surprisingly similar to those of real biological neural networks.
arXiv Detail & Related papers (2020-07-13T17:59:31Z)

This list is automatically generated from the titles and abstracts of the papers in this site.