On the dynamics of three-layer neural networks: initial condensation
- URL: http://arxiv.org/abs/2402.15958v2
- Date: Tue, 27 Feb 2024 05:54:10 GMT
- Title: On the dynamics of three-layer neural networks: initial condensation
- Authors: Zheng-An Chen, Tao Luo
- Abstract summary: condensation occurs when gradient methods spontaneously reduce the complexity of neural networks.
We establish the blow-up property of effective dynamics and present a sufficient condition for the occurrence of condensation.
We also explore the association between condensation and the low-rank bias observed in deep matrix factorization.
- Score: 2.022855152231054
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Empirical and theoretical works show that the input weights of two-layer
neural networks, when initialized with small values, converge towards isolated
orientations. This phenomenon, referred to as condensation, indicates that the
gradient descent methods tend to spontaneously reduce the complexity of neural
networks during the training process. In this work, we elucidate the mechanisms
behind the condensation phenomena occurring in the training of three-layer
neural networks and distinguish it from the training of two-layer neural
networks. Through rigorous theoretical analysis, we establish the blow-up
property of effective dynamics and present a sufficient condition for the
occurrence of condensation, findings that are substantiated by experimental
results. Additionally, we explore the association between condensation and the
low-rank bias observed in deep matrix factorization.
Related papers
- Exploring neural oscillations during speech perception via surrogate gradient spiking neural networks [59.38765771221084]
We present a physiologically inspired speech recognition architecture compatible and scalable with deep learning frameworks.
We show end-to-end gradient descent training leads to the emergence of neural oscillations in the central spiking neural network.
Our findings highlight the crucial inhibitory role of feedback mechanisms, such as spike frequency adaptation and recurrent connections, in regulating and synchronising neural activity to improve recognition performance.
arXiv Detail & Related papers (2024-04-22T09:40:07Z) - Early Directional Convergence in Deep Homogeneous Neural Networks for
Small Initializations [2.310288676109785]
This paper studies the gradient flow dynamics that arise when training deep homogeneous neural networks.
The weights of the neural network remain small in norm and approximately converge in direction along the Karush-Kuhn-Tucker points.
arXiv Detail & Related papers (2024-03-12T23:17:32Z) - Understanding the Initial Condensation of Convolutional Neural Networks [6.451914896767135]
kernels of two-layer convolutional neural networks converge to one or a few directions during training.
This work represents a step towards a better understanding of the non-linear training behavior exhibited by neural networks with specialized structures.
arXiv Detail & Related papers (2023-05-17T05:00:47Z) - Phase Diagram of Initial Condensation for Two-layer Neural Networks [4.404198015660192]
We present a phase diagram of initial condensation for two-layer neural networks.
Our phase diagram serves to provide a comprehensive understanding of the dynamical regimes of neural networks.
arXiv Detail & Related papers (2023-03-12T03:55:38Z) - Stochastic Gradient Descent-Induced Drift of Representation in a
Two-Layer Neural Network [0.0]
Despite being observed in the brain and in artificial networks, the mechanisms of drift and its implications are not fully understood.
Motivated by recent experimental findings of stimulus-dependent drift in the piriform cortex, we use theory and simulations to study this phenomenon in a two-layer linear feedforward network.
arXiv Detail & Related papers (2023-02-06T04:56:05Z) - Rank Diminishing in Deep Neural Networks [71.03777954670323]
Rank of neural networks measures information flowing across layers.
It is an instance of a key structural condition that applies across broad domains of machine learning.
For neural networks, however, the intrinsic mechanism that yields low-rank structures remains vague and unclear.
arXiv Detail & Related papers (2022-06-13T12:03:32Z) - A duality connecting neural network and cosmological dynamics [0.0]
We show that the dynamics of neural networks trained with gradient descent and the dynamics of scalar fields in a flat, vacuum energy dominated Universe are structurally related.
This duality provides the framework for synergies between these systems, to understand and explain neural network dynamics.
arXiv Detail & Related papers (2022-02-22T19:00:01Z) - Data-driven emergence of convolutional structure in neural networks [83.4920717252233]
We show how fully-connected neural networks solving a discrimination task can learn a convolutional structure directly from their inputs.
By carefully designing data models, we show that the emergence of this pattern is triggered by the non-Gaussian, higher-order local structure of the inputs.
arXiv Detail & Related papers (2022-02-01T17:11:13Z) - Multi-scale Feature Learning Dynamics: Insights for Double Descent [71.91871020059857]
We study the phenomenon of "double descent" of the generalization error.
We find that double descent can be attributed to distinct features being learned at different scales.
arXiv Detail & Related papers (2021-12-06T18:17:08Z) - Gradient Starvation: A Learning Proclivity in Neural Networks [97.02382916372594]
Gradient Starvation arises when cross-entropy loss is minimized by capturing only a subset of features relevant for the task.
This work provides a theoretical explanation for the emergence of such feature imbalance in neural networks.
arXiv Detail & Related papers (2020-11-18T18:52:08Z) - Feature Purification: How Adversarial Training Performs Robust Deep
Learning [66.05472746340142]
We show a principle that we call Feature Purification, where we show one of the causes of the existence of adversarial examples is the accumulation of certain small dense mixtures in the hidden weights during the training process of a neural network.
We present both experiments on the CIFAR-10 dataset to illustrate this principle, and a theoretical result proving that for certain natural classification tasks, training a two-layer neural network with ReLU activation using randomly gradient descent indeed this principle.
arXiv Detail & Related papers (2020-05-20T16:56:08Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.