Anomalous diffusion dynamics of learning in deep neural networks
- URL: http://arxiv.org/abs/2009.10588v2
- Date: Sun, 25 Jul 2021 08:13:24 GMT
- Title: Anomalous diffusion dynamics of learning in deep neural networks
- Authors: Guozhang Chen, Cheng Kevin Qu, Pulin Gong
- Abstract summary: Learning in deep neural networks (DNNs) is implemented through minimizing a highly non-equilibrium loss function.
We present a novel account of how such effective deep learning emerges through the interactions of the fractal-like structure of the loss landscape.
- Score: 0.0
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Learning in deep neural networks (DNNs) is implemented through minimizing a
highly non-convex loss function, typically by a stochastic gradient descent
(SGD) method. This learning process can effectively find good wide minima
without being trapped in poor local ones. We present a novel account of how
such effective deep learning emerges through the interactions of the SGD and
the geometrical structure of the loss landscape. Rather than being a normal
diffusion process (i.e. Brownian motion) as often assumed, we find that the SGD
exhibits rich, complex dynamics when navigating through the loss landscape;
initially, the SGD exhibits anomalous superdiffusion, which attenuates
gradually and changes to subdiffusion at long times when the solution is
reached. Such learning dynamics happen ubiquitously in different DNNs such as
ResNet and VGG-like networks and are insensitive to batch size and learning
rate. The anomalous superdiffusion process during the initial learning phase
indicates that the motion of SGD along the loss landscape possesses
intermittent, big jumps; this non-equilibrium property enables the SGD to
escape from sharp local minima. By adapting the methods developed for studying
energy landscapes in complex physical systems, we find that such superdiffusive
learning dynamics are due to the interactions of the SGD and the fractal-like
structure of the loss landscape. We further develop a simple model to
demonstrate the mechanistic role of the fractal loss landscape in enabling the
SGD to effectively find global minima. Our results thus reveal the
effectiveness of deep learning from a novel perspective and have implications
for designing efficient deep neural networks.
Related papers
- Dynamical loss functions shape landscape topography and improve learning in artificial neural networks [0.9208007322096533]
We show how to transform cross-entropy and mean squared error into dynamical loss functions.
We show how they significantly improve validation accuracy for networks of varying sizes.
arXiv Detail & Related papers (2024-10-14T16:27:03Z) - Exploring Loss Landscapes through the Lens of Spin Glass Theory [8.693506828591282]
In deep neural networks (DNNs), internal representations, decision-making mechanism, absence of overfitting in an over-parametrized space, superior generalizability, etc., remain less understood.
This paper delves into the loss landscape of DNNs through the lens of spin glass in statistical physics, a system characterized by a complex energy landscape with numerous metastable states.
arXiv Detail & Related papers (2024-07-30T10:37:15Z) - Towards Generalization in Subitizing with Neuro-Symbolic Loss using
Holographic Reduced Representations [49.22640185566807]
We show that adapting tools used in CogSci research can improve the subitizing generalization of CNNs and ViTs.
We investigate how this neuro-symbolic approach to learning affects the subitizing capability of CNNs and ViTs.
We find that ViTs perform considerably worse compared to CNNs in most respects on subitizing, except on one axis where an HRR-based loss provides improvement.
arXiv Detail & Related papers (2023-12-23T17:54:03Z) - Lightweight Diffusion Models with Distillation-Based Block Neural
Architecture Search [55.41583104734349]
We propose to automatically remove structural redundancy in diffusion models with our proposed Diffusion Distillation-based Block-wise Neural Architecture Search (NAS)
Given a larger pretrained teacher, we leverage DiffNAS to search for the smallest architecture which can achieve on-par or even better performance than the teacher.
Different from previous block-wise NAS methods, DiffNAS contains a block-wise local search strategy and a retraining strategy with a joint dynamic loss.
arXiv Detail & Related papers (2023-11-08T12:56:59Z) - SGD with Large Step Sizes Learns Sparse Features [22.959258640051342]
We showcase important features of the dynamics of the Gradient Descent (SGD) in the training of neural networks.
We show that the longer large step sizes keep SGD high in the loss landscape, the better the implicit regularization can operate and find sparse representations.
arXiv Detail & Related papers (2022-10-11T11:00:04Z) - Early Stage Convergence and Global Convergence of Training Mildly
Parameterized Neural Networks [3.148524502470734]
We show that the loss is decreased by a significant amount in the early stage of the training, and this decrease is fast.
We use a microscopic analysis of the activation patterns for the neurons, which helps us derive more powerful lower bounds for the gradient.
arXiv Detail & Related papers (2022-06-05T09:56:50Z) - FuNNscope: Visual microscope for interactively exploring the loss
landscape of fully connected neural networks [77.34726150561087]
We show how to explore high-dimensional landscape characteristics of neural networks.
We generalize observations on small neural networks to more complex systems.
An interactive dashboard opens up a number of possible application networks.
arXiv Detail & Related papers (2022-04-09T16:41:53Z) - Reducing Catastrophic Forgetting in Self Organizing Maps with
Internally-Induced Generative Replay [67.50637511633212]
A lifelong learning agent is able to continually learn from potentially infinite streams of pattern sensory data.
One major historic difficulty in building agents that adapt is that neural systems struggle to retain previously-acquired knowledge when learning from new samples.
This problem is known as catastrophic forgetting (interference) and remains an unsolved problem in the domain of machine learning to this day.
arXiv Detail & Related papers (2021-12-09T07:11:14Z) - Going beyond p-convolutions to learn grayscale morphological operators [64.38361575778237]
We present two new morphological layers based on the same principle as the p-convolutional layer.
In this work, we present two new morphological layers based on the same principle as the p-convolutional layer.
arXiv Detail & Related papers (2021-02-19T17:22:16Z) - Tilting the playing field: Dynamical loss functions for machine learning [18.831125493827766]
We show that learning can be improved by using loss functions that evolve cyclically during training to emphasize one class at a time.
Improvement arises from the interplay of the changing loss landscape with the dynamics of the system as it evolves to minimize the loss.
arXiv Detail & Related papers (2021-02-07T13:15:08Z) - Limited-angle tomographic reconstruction of dense layered objects by
dynamical machine learning [68.9515120904028]
Limited-angle tomography of strongly scattering quasi-transparent objects is a challenging, highly ill-posed problem.
Regularizing priors are necessary to reduce artifacts by improving the condition of such problems.
We devised a recurrent neural network (RNN) architecture with a novel split-convolutional gated recurrent unit (SC-GRU) as the building block.
arXiv Detail & Related papers (2020-07-21T11:48:22Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.