Related papers: Phase Diagram of Initial Condensation for Two-layer Neural Networks

Phase Diagram of Initial Condensation for Two-layer Neural Networks

URL: http://arxiv.org/abs/2303.06561v2
Date: Sat, 8 Apr 2023 00:12:27 GMT
Title: Phase Diagram of Initial Condensation for Two-layer Neural Networks
Authors: Zhengan Chen, Yuqing Li, Tao Luo, Zhangchen Zhou, Zhi-Qin John Xu
Abstract summary: We present a phase diagram of initial condensation for two-layer neural networks. Our phase diagram serves to provide a comprehensive understanding of the dynamical regimes of neural networks.
Score: 4.404198015660192
License: http://creativecommons.org/licenses/by/4.0/
Abstract: The phenomenon of distinct behaviors exhibited by neural networks under varying scales of initialization remains an enigma in deep learning research. In this paper, based on the earlier work by Luo et al.~\cite{luo2021phase}, we present a phase diagram of initial condensation for two-layer neural networks. Condensation is a phenomenon wherein the weight vectors of neural networks concentrate on isolated orientations during the training process, and it is a feature in non-linear learning process that enables neural networks to possess better generalization abilities. Our phase diagram serves to provide a comprehensive understanding of the dynamical regimes of neural networks and their dependence on the choice of hyperparameters related to initialization. Furthermore, we demonstrate in detail the underlying mechanisms by which small initialization leads to condensation at the initial training stage.

Related papers

New Evidence of the Two-Phase Learning Dynamics of Neural Networks [59.55028392232715]
We introduce an interval-wise perspective that compares network states across a time window.<n>We show that the response of the network to a perturbation exhibits a transition from chaotic to stable.<n>We also find that after this transition point the model's functional trajectory is confined to a narrow cone-shaped subset.
arXiv Detail & Related papers (2025-05-20T04:03:52Z)
An overview of condensation phenomenon in deep learning [7.264378254137811]
During the nonlinear training of neural networks, neurons in the same layer tend to condense into groups with similar outputs. We examine the underlying mechanisms of condensation from the perspectives of training dynamics and the structure of the loss landscape. The condensation phenomenon offers valuable insights into the abilities of neural networks and correlates to stronger reasoning abilities in transformer-based language models.
arXiv Detail & Related papers (2025-04-13T08:50:24Z)
Collective variables of neural networks: empirical time evolution and scaling laws [0.535514140374842]
We show that certain measures on the spectrum of the empirical neural tangent kernel, specifically entropy and trace, yield insight into the representations learned by a neural network. Results are demonstrated first on test cases before being shown on more complex networks, including transformers, auto-encoders, graph neural networks, and reinforcement learning studies.
arXiv Detail & Related papers (2024-10-09T21:37:14Z)
Contrastive Learning in Memristor-based Neuromorphic Systems [55.11642177631929]
Spiking neural networks have become an important family of neuron-based models that sidestep many of the key limitations facing modern-day backpropagation-trained deep networks. In this work, we design and investigate a proof-of-concept instantiation of contrastive-signal-dependent plasticity (CSDP), a neuromorphic form of forward-forward-based, backpropagation-free learning.
arXiv Detail & Related papers (2024-09-17T04:48:45Z)
Opening the Black Box: predicting the trainability of deep neural networks with reconstruction entropy [0.0]
We present a method for predicting the trainable regime in parameter space for deep feedforward neural networks. For both the MNIST and CIFAR10 datasets, we show that a single epoch of training is sufficient to predict the trainability of the deep feedforward network.
arXiv Detail & Related papers (2024-06-13T18:00:05Z)
Early Directional Convergence in Deep Homogeneous Neural Networks for Small Initializations [2.310288676109785]
This paper studies the gradient flow dynamics that arise when training deep homogeneous neural networks. The weights of the neural network remain small in norm and approximately converge in direction along the Karush-Kuhn-Tucker points.
arXiv Detail & Related papers (2024-03-12T23:17:32Z)
On the dynamics of three-layer neural networks: initial condensation [2.022855152231054]
condensation occurs when gradient methods spontaneously reduce the complexity of neural networks. We establish the blow-up property of effective dynamics and present a sufficient condition for the occurrence of condensation. We also explore the association between condensation and the low-rank bias observed in deep matrix factorization.
arXiv Detail & Related papers (2024-02-25T02:36:14Z)
Connecting NTK and NNGP: A Unified Theoretical Framework for Wide Neural Network Learning Dynamics [6.349503549199403]
We provide a comprehensive framework for the learning process of deep wide neural networks. By characterizing the diffusive phase, our work sheds light on representational drift in the brain.
arXiv Detail & Related papers (2023-09-08T18:00:01Z)
Understanding the Initial Condensation of Convolutional Neural Networks [6.451914896767135]
kernels of two-layer convolutional neural networks converge to one or a few directions during training. This work represents a step towards a better understanding of the non-linear training behavior exhibited by neural networks with specialized structures.
arXiv Detail & Related papers (2023-05-17T05:00:47Z)
Gradient Descent in Neural Networks as Sequential Learning in RKBS [63.011641517977644]
We construct an exact power-series representation of the neural network in a finite neighborhood of the initial weights. We prove that, regardless of width, the training sequence produced by gradient descent can be exactly replicated by regularized sequential learning.
arXiv Detail & Related papers (2023-02-01T03:18:07Z)
Spiking neural network for nonlinear regression [68.8204255655161]
Spiking neural networks carry the potential for a massive reduction in memory and energy consumption. They introduce temporal and neuronal sparsity, which can be exploited by next-generation neuromorphic hardware. A framework for regression using spiking neural networks is proposed.
arXiv Detail & Related papers (2022-10-06T13:04:45Z)
Data-driven emergence of convolutional structure in neural networks [83.4920717252233]
We show how fully-connected neural networks solving a discrimination task can learn a convolutional structure directly from their inputs. By carefully designing data models, we show that the emergence of this pattern is triggered by the non-Gaussian, higher-order local structure of the inputs.
arXiv Detail & Related papers (2022-02-01T17:11:13Z)
What can linearized neural networks actually say about generalization? [67.83999394554621]
In certain infinitely-wide neural networks, the neural tangent kernel (NTK) theory fully characterizes generalization. We show that the linear approximations can indeed rank the learning complexity of certain tasks for neural networks. Our work provides concrete examples of novel deep learning phenomena which can inspire future theoretical research.
arXiv Detail & Related papers (2021-06-12T13:05:11Z)
MSE-Optimal Neural Network Initialization via Layer Fusion [68.72356718879428]
Deep neural networks achieve state-of-the-art performance for a range of classification and inference tasks. The use of gradient combined nonvolutionity renders learning susceptible to novel problems. We propose fusing neighboring layers of deeper networks that are trained with random variables.
arXiv Detail & Related papers (2020-01-28T18:25:15Z)

This list is automatically generated from the titles and abstracts of the papers in this site.