Dataset-learning duality and emergent criticality
- URL: http://arxiv.org/abs/2405.17391v2
- Date: Fri, 16 Aug 2024 15:29:52 GMT
- Title: Dataset-learning duality and emergent criticality
- Authors: Ekaterina Kukleva, Vitaly Vanchurin,
- Abstract summary: We show a duality map between a subspace of non-trainable variables and a subspace of trainable variables.
We use the duality to study the emergence of criticality, or the power-law distributions of fluctuations of the trainable variables.
- Score: 0.0
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: In artificial neural networks, the activation dynamics of non-trainable variables is strongly coupled to the learning dynamics of trainable variables. During the activation pass, the boundary neurons (e.g., input neurons) are mapped to the bulk neurons (e.g., hidden neurons), and during the learning pass, both bulk and boundary neurons are mapped to changes in trainable variables (e.g., weights and biases). For example, in feed-forward neural networks, forward propagation is the activation pass and backward propagation is the learning pass. We show that a composition of the two maps establishes a duality map between a subspace of non-trainable boundary variables (e.g., dataset) and a tangent subspace of trainable variables (i.e., learning). In general, the dataset-learning duality is a complex non-linear map between high-dimensional spaces, but in a learning equilibrium, the problem can be linearized and reduced to many weakly coupled one-dimensional problems. We use the duality to study the emergence of criticality, or the power-law distributions of fluctuations of the trainable variables. In particular, we show that criticality can emerge in the learning system even from the dataset in a non-critical state, and that the power-law distribution can be modified by changing either the activation function or the loss function.
Related papers
- Don't Cut Corners: Exact Conditions for Modularity in Biologically Inspired Representations [52.48094670415497]
We develop a theory of when biologically inspired representations modularise with respect to source variables (sources)
We derive necessary and sufficient conditions on a sample of sources that determine whether the neurons in an optimal biologically-inspired linear autoencoder modularise.
Our theory applies to any dataset, extending far beyond the case of statistical independence studied in previous work.
arXiv Detail & Related papers (2024-10-08T17:41:37Z) - Dynamical transition in controllable quantum neural networks with large depth [7.22617261255808]
We show that the training dynamics of quantum neural networks with a quadratic loss function can be described by the generalized Lotka-Volterra equations.
We show that a quadratic loss function within the frozen-error dynamics enables a speedup in the training convergence.
The theory findings are verified experimentally on IBM quantum devices.
arXiv Detail & Related papers (2023-11-29T23:14:33Z) - A Step Towards Uncovering The Structure of Multistable Neural Networks [1.14219428942199]
We study the structure of multistable recurrent neural networks.
The activation function is simplified by a nonsmooth Heaviside step function.
We derive how multistability is encoded within the network architecture.
arXiv Detail & Related papers (2022-10-06T22:54:17Z) - Benchmarking Compositionality with Formal Languages [64.09083307778951]
We investigate whether large neural models in NLP can acquire the ability tocombining primitive concepts into larger novel combinations while learning from data.
By randomly sampling over many transducers, we explore which of their properties contribute to learnability of a compositional relation by a neural network.
We find that the models either learn the relations completely or not at all. The key is transition coverage, setting a soft learnability limit at 400 examples per transition.
arXiv Detail & Related papers (2022-08-17T10:03:18Z) - Data-driven emergence of convolutional structure in neural networks [83.4920717252233]
We show how fully-connected neural networks solving a discrimination task can learn a convolutional structure directly from their inputs.
By carefully designing data models, we show that the emergence of this pattern is triggered by the non-Gaussian, higher-order local structure of the inputs.
arXiv Detail & Related papers (2022-02-01T17:11:13Z) - Multi-scale Feature Learning Dynamics: Insights for Double Descent [71.91871020059857]
We study the phenomenon of "double descent" of the generalization error.
We find that double descent can be attributed to distinct features being learned at different scales.
arXiv Detail & Related papers (2021-12-06T18:17:08Z) - Towards a theory of quantum gravity from neural networks [0.0]
We show that the non-equilibrium dynamics of trainable variables can be described by the Madelung equations.
We argue that the Lorentz symmetries and curved space-time can emerge from the interplay between entropy production and entropy destruction due to learning.
arXiv Detail & Related papers (2021-10-28T12:39:01Z) - Self-organized criticality in neural networks [0.0]
We show that learning dynamics of neural networks is generically attracted towards a self-organized critical state.
Our results support the claim that the universe might be a neural network.
arXiv Detail & Related papers (2021-07-07T18:00:03Z) - Artificial Neural Variability for Deep Learning: On Overfitting, Noise
Memorization, and Catastrophic Forgetting [135.0863818867184]
artificial neural variability (ANV) helps artificial neural networks learn some advantages from natural'' neural networks.
ANV plays as an implicit regularizer of the mutual information between the training data and the learned model.
It can effectively relieve overfitting, label noise memorization, and catastrophic forgetting at negligible costs.
arXiv Detail & Related papers (2020-11-12T06:06:33Z) - Unitary Learning for Deep Diffractive Neural Network [0.0]
We present a unitary learning protocol on deep diffractive neural network.
The temporal-space evolution characteristic in unitary learning is formulated and elucidated.
As a preliminary application, deep diffractive neural network with unitary learning is tentatively implemented on the 2D classification and verification tasks.
arXiv Detail & Related papers (2020-08-17T07:16:09Z) - Can Temporal-Difference and Q-Learning Learn Representation? A Mean-Field Theory [110.99247009159726]
Temporal-difference and Q-learning play a key role in deep reinforcement learning, where they are empowered by expressive nonlinear function approximators such as neural networks.
In particular, temporal-difference learning converges when the function approximator is linear in a feature representation, which is fixed throughout learning, and possibly diverges otherwise.
arXiv Detail & Related papers (2020-06-08T17:25:22Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.