Related papers: The Quenching-Activation Behavior of the Gradient Descent Dynamics for Two-layer Neural Network Models

The Quenching-Activation Behavior of the Gradient Descent Dynamics for Two-layer Neural Network Models

URL: http://arxiv.org/abs/2006.14450v1
Date: Thu, 25 Jun 2020 14:41:53 GMT
Title: The Quenching-Activation Behavior of the Gradient Descent Dynamics for Two-layer Neural Network Models
Authors: Chao Ma, Lei Wu, Weinan E
Abstract summary: gradient descent algorithm for training two-layer neural network models is studied. Two distinctive phases in the dynamic behavior of GD in the under-parametrized regime are studied. The quenching-activation process seems to provide a clear mechanism for "implicit regularization"
Score: 12.865834066050427
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: A numerical and phenomenological study of the gradient descent (GD) algorithm for training two-layer neural network models is carried out for different parameter regimes when the target function can be accurately approximated by a relatively small number of neurons. It is found that for Xavier-like initialization, there are two distinctive phases in the dynamic behavior of GD in the under-parametrized regime: An early phase in which the GD dynamics follows closely that of the corresponding random feature model and the neurons are effectively quenched, followed by a late phase in which the neurons are divided into two groups: a group of a few "activated" neurons that dominate the dynamics and a group of background (or "quenched") neurons that support the continued activation and deactivation process. This neural network-like behavior is continued into the mildly over-parametrized regime, where it undergoes a transition to a random feature-like behavior. The quenching-activation process seems to provide a clear mechanism for "implicit regularization". This is qualitatively different from the dynamics associated with the "mean-field" scaling where all neurons participate equally and there does not appear to be qualitative changes when the network parameters are changed.

Related papers

Allostatic Control of Persistent States in Spiking Neural Networks for perception and computation [79.16635054977068]
We introduce a novel model for updating perceptual beliefs about the environment by extending the concept of Allostasis to the control of internal representations. In this paper, we focus on an application in numerical cognition, where a bump of activity in an attractor network is used as a spatial numerical representation.
arXiv Detail & Related papers (2025-03-20T12:28:08Z)
Reconstruction of neuromorphic dynamics from a single scalar time series using variational autoencoder and neural network map [0.0]
A model of a physiological neuron based on the Hodgkin-Huxley formalism is considered. Single time series of one of its variables is shown to be enough to train a neural network that can operate as a discrete time dynamical system.
arXiv Detail & Related papers (2024-11-11T15:15:55Z)
Confidence Regulation Neurons in Language Models [91.90337752432075]
This study investigates the mechanisms by which large language models represent and regulate uncertainty in next-token predictions. Entropy neurons are characterized by an unusually high weight norm and influence the final layer normalization (LayerNorm) scale to effectively scale down the logits. token frequency neurons, which we describe here for the first time, boost or suppress each token's logit proportionally to its log frequency, thereby shifting the output distribution towards or away from the unigram distribution.
arXiv Detail & Related papers (2024-06-24T01:31:03Z)
Connecting NTK and NNGP: A Unified Theoretical Framework for Wide Neural Network Learning Dynamics [6.349503549199403]
We provide a comprehensive framework for the learning process of deep wide neural networks. By characterizing the diffusive phase, our work sheds light on representational drift in the brain.
arXiv Detail & Related papers (2023-09-08T18:00:01Z)
STNDT: Modeling Neural Population Activity with a Spatiotemporal Transformer [19.329190789275565]
We introduce SpatioTemporal Neural Data Transformer (STNDT), an NDT-based architecture that explicitly models responses of individual neurons. We show that our model achieves state-of-the-art performance on ensemble level in estimating neural activities across four neural datasets.
arXiv Detail & Related papers (2022-06-09T18:54:23Z)
Dynamic Neural Diversification: Path to Computationally Sustainable Neural Networks [68.8204255655161]
Small neural networks with a constrained number of trainable parameters, can be suitable resource-efficient candidates for many simple tasks. We explore the diversity of the neurons within the hidden layer during the learning process. We analyze how the diversity of the neurons affects predictions of the model.
arXiv Detail & Related papers (2021-09-20T15:12:16Z)
Continuous Learning and Adaptation with Membrane Potential and Activation Threshold Homeostasis [91.3755431537592]
This paper presents the Membrane Potential and Activation Threshold Homeostasis (MPATH) neuron model. The model allows neurons to maintain a form of dynamic equilibrium by automatically regulating their activity when presented with input. Experiments demonstrate the model's ability to adapt to and continually learn from its input.
arXiv Detail & Related papers (2021-04-22T04:01:32Z)
Going beyond p-convolutions to learn grayscale morphological operators [64.38361575778237]
We present two new morphological layers based on the same principle as the p-convolutional layer. In this work, we present two new morphological layers based on the same principle as the p-convolutional layer.
arXiv Detail & Related papers (2021-02-19T17:22:16Z)
And/or trade-off in artificial neurons: impact on adversarial robustness [91.3755431537592]
Presence of sufficient number of OR-like neurons in a network can lead to classification brittleness and increased vulnerability to adversarial attacks. We define AND-like neurons and propose measures to increase their proportion in the network. Experimental results on the MNIST dataset suggest that our approach holds promise as a direction for further exploration.
arXiv Detail & Related papers (2021-02-15T08:19:05Z)
Gradient Starvation: A Learning Proclivity in Neural Networks [97.02382916372594]
Gradient Starvation arises when cross-entropy loss is minimized by capturing only a subset of features relevant for the task. This work provides a theoretical explanation for the emergence of such feature imbalance in neural networks.
arXiv Detail & Related papers (2020-11-18T18:52:08Z)
Phase diagram for two-layer ReLU neural networks at infinite-width limit [6.380166265263755]
We draw the phase diagram for the two-layer ReLU neural network at the infinite-width limit. We identify three regimes in the phase diagram, i.e., linear regime, critical regime and condensed regime. In the linear regime, NN training dynamics is approximately linear similar to a random feature model with an exponential loss decay. In the condensed regime, we demonstrate through experiments that active neurons are condensed at several discrete orientations.
arXiv Detail & Related papers (2020-07-15T06:04:35Z)
Unifying and generalizing models of neural dynamics during decision-making [27.46508483610472]
We propose a unifying framework for modeling neural activity during decision-making tasks. The framework includes the canonical drift-diffusion model and enables extensions such as multi-dimensional accumulators, variable and collapsing boundaries, and discrete jumps.
arXiv Detail & Related papers (2020-01-13T23:57:28Z)

This list is automatically generated from the titles and abstracts of the papers in this site.