The Quenching-Activation Behavior of the Gradient Descent Dynamics for
Two-layer Neural Network Models
- URL: http://arxiv.org/abs/2006.14450v1
- Date: Thu, 25 Jun 2020 14:41:53 GMT
- Title: The Quenching-Activation Behavior of the Gradient Descent Dynamics for
Two-layer Neural Network Models
- Authors: Chao Ma, Lei Wu, Weinan E
- Abstract summary: gradient descent algorithm for training two-layer neural network models is studied.
Two distinctive phases in the dynamic behavior of GD in the under-parametrized regime are studied.
The quenching-activation process seems to provide a clear mechanism for "implicit regularization"
- Score: 12.865834066050427
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: A numerical and phenomenological study of the gradient descent (GD) algorithm
for training two-layer neural network models is carried out for different
parameter regimes when the target function can be accurately approximated by a
relatively small number of neurons. It is found that for Xavier-like
initialization, there are two distinctive phases in the dynamic behavior of GD
in the under-parametrized regime: An early phase in which the GD dynamics
follows closely that of the corresponding random feature model and the neurons
are effectively quenched, followed by a late phase in which the neurons are
divided into two groups: a group of a few "activated" neurons that dominate the
dynamics and a group of background (or "quenched") neurons that support the
continued activation and deactivation process. This neural network-like
behavior is continued into the mildly over-parametrized regime, where it
undergoes a transition to a random feature-like behavior. The
quenching-activation process seems to provide a clear mechanism for "implicit
regularization". This is qualitatively different from the dynamics associated
with the "mean-field" scaling where all neurons participate equally and there
does not appear to be qualitative changes when the network parameters are
changed.
Related papers
- Reconstruction of neuromorphic dynamics from a single scalar time series using variational autoencoder and neural network map [0.0]
A model of a physiological neuron based on the Hodgkin-Huxley formalism is considered.
Single time series of one of its variables is shown to be enough to train a neural network that can operate as a discrete time dynamical system.
arXiv Detail & Related papers (2024-11-11T15:15:55Z) - Confidence Regulation Neurons in Language Models [91.90337752432075]
This study investigates the mechanisms by which large language models represent and regulate uncertainty in next-token predictions.
Entropy neurons are characterized by an unusually high weight norm and influence the final layer normalization (LayerNorm) scale to effectively scale down the logits.
token frequency neurons, which we describe here for the first time, boost or suppress each token's logit proportionally to its log frequency, thereby shifting the output distribution towards or away from the unigram distribution.
arXiv Detail & Related papers (2024-06-24T01:31:03Z) - STNDT: Modeling Neural Population Activity with a Spatiotemporal
Transformer [19.329190789275565]
We introduce SpatioTemporal Neural Data Transformer (STNDT), an NDT-based architecture that explicitly models responses of individual neurons.
We show that our model achieves state-of-the-art performance on ensemble level in estimating neural activities across four neural datasets.
arXiv Detail & Related papers (2022-06-09T18:54:23Z) - Dynamic Neural Diversification: Path to Computationally Sustainable
Neural Networks [68.8204255655161]
Small neural networks with a constrained number of trainable parameters, can be suitable resource-efficient candidates for many simple tasks.
We explore the diversity of the neurons within the hidden layer during the learning process.
We analyze how the diversity of the neurons affects predictions of the model.
arXiv Detail & Related papers (2021-09-20T15:12:16Z) - Continuous Learning and Adaptation with Membrane Potential and
Activation Threshold Homeostasis [91.3755431537592]
This paper presents the Membrane Potential and Activation Threshold Homeostasis (MPATH) neuron model.
The model allows neurons to maintain a form of dynamic equilibrium by automatically regulating their activity when presented with input.
Experiments demonstrate the model's ability to adapt to and continually learn from its input.
arXiv Detail & Related papers (2021-04-22T04:01:32Z) - Going beyond p-convolutions to learn grayscale morphological operators [64.38361575778237]
We present two new morphological layers based on the same principle as the p-convolutional layer.
In this work, we present two new morphological layers based on the same principle as the p-convolutional layer.
arXiv Detail & Related papers (2021-02-19T17:22:16Z) - And/or trade-off in artificial neurons: impact on adversarial robustness [91.3755431537592]
Presence of sufficient number of OR-like neurons in a network can lead to classification brittleness and increased vulnerability to adversarial attacks.
We define AND-like neurons and propose measures to increase their proportion in the network.
Experimental results on the MNIST dataset suggest that our approach holds promise as a direction for further exploration.
arXiv Detail & Related papers (2021-02-15T08:19:05Z) - Gradient Starvation: A Learning Proclivity in Neural Networks [97.02382916372594]
Gradient Starvation arises when cross-entropy loss is minimized by capturing only a subset of features relevant for the task.
This work provides a theoretical explanation for the emergence of such feature imbalance in neural networks.
arXiv Detail & Related papers (2020-11-18T18:52:08Z) - Phase diagram for two-layer ReLU neural networks at infinite-width limit [6.380166265263755]
We draw the phase diagram for the two-layer ReLU neural network at the infinite-width limit.
We identify three regimes in the phase diagram, i.e., linear regime, critical regime and condensed regime.
In the linear regime, NN training dynamics is approximately linear similar to a random feature model with an exponential loss decay.
In the condensed regime, we demonstrate through experiments that active neurons are condensed at several discrete orientations.
arXiv Detail & Related papers (2020-07-15T06:04:35Z) - Unifying and generalizing models of neural dynamics during
decision-making [27.46508483610472]
We propose a unifying framework for modeling neural activity during decision-making tasks.
The framework includes the canonical drift-diffusion model and enables extensions such as multi-dimensional accumulators, variable and collapsing boundaries, and discrete jumps.
arXiv Detail & Related papers (2020-01-13T23:57:28Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.