Partial local entropy and anisotropy in deep weight spaces
- URL: http://arxiv.org/abs/2007.09091v3
- Date: Tue, 6 Apr 2021 10:59:10 GMT
- Title: Partial local entropy and anisotropy in deep weight spaces
- Authors: Daniele Musso
- Abstract summary: We refine a recently-proposed class of local entropic loss functions by restricting the smoothening regularization to only a subset of weights.
The new loss functions are referred to as partial local entropies. They can adapt to the weight-space anisotropy, thus outperforming their isotropic counterparts.
- Score: 0.0
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: We refine a recently-proposed class of local entropic loss functions by
restricting the smoothening regularization to only a subset of weights. The new
loss functions are referred to as partial local entropies. They can adapt to
the weight-space anisotropy, thus outperforming their isotropic counterparts.
We support the theoretical analysis with experiments on image classification
tasks performed with multi-layer, fully-connected and convolutional neural
networks. The present study suggests how to better exploit the anisotropic
nature of deep landscapes and provides direct probes of the shape of the minima
encountered by stochastic gradient descent algorithms. As a by-product, we
observe an asymptotic dynamical regime at late training times where the
temperature of all the layers obeys a common cooling behavior.
Related papers
- On the Dynamics Under the Unhinged Loss and Beyond [104.49565602940699]
We introduce the unhinged loss, a concise loss function, that offers more mathematical opportunities to analyze closed-form dynamics.
The unhinged loss allows for considering more practical techniques, such as time-vary learning rates and feature normalization.
arXiv Detail & Related papers (2023-12-13T02:11:07Z) - Convergence of mean-field Langevin dynamics: Time and space
discretization, stochastic gradient, and variance reduction [49.66486092259376]
The mean-field Langevin dynamics (MFLD) is a nonlinear generalization of the Langevin dynamics that incorporates a distribution-dependent drift.
Recent works have shown that MFLD globally minimizes an entropy-regularized convex functional in the space of measures.
We provide a framework to prove a uniform-in-time propagation of chaos for MFLD that takes into account the errors due to finite-particle approximation, time-discretization, and gradient approximation.
arXiv Detail & Related papers (2023-06-12T16:28:11Z) - Machine learning in and out of equilibrium [58.88325379746631]
Our study uses a Fokker-Planck approach, adapted from statistical physics, to explore these parallels.
We focus in particular on the stationary state of the system in the long-time limit, which in conventional SGD is out of equilibrium.
We propose a new variation of Langevin dynamics (SGLD) that harnesses without replacement minibatching.
arXiv Detail & Related papers (2023-06-06T09:12:49Z) - Theory on variational high-dimensional tensor networks [2.0307382542339485]
We investigate the emergent statistical properties of random high-dimensional-network states and the trainability of tensoral networks.
We prove that variational high-dimensional networks suffer from barren plateaus for global loss functions.
Our results pave a way for their future theoretical studies and practical applications.
arXiv Detail & Related papers (2023-03-30T15:26:30Z) - On Convergence of Training Loss Without Reaching Stationary Points [62.41370821014218]
We show that Neural Network weight variables do not converge to stationary points where the gradient the loss function vanishes.
We propose a new perspective based on ergodic theory dynamical systems.
arXiv Detail & Related papers (2021-10-12T18:12:23Z) - The Limiting Dynamics of SGD: Modified Loss, Phase Space Oscillations,
and Anomalous Diffusion [29.489737359897312]
We study the limiting dynamics of deep neural networks trained with gradient descent (SGD)
We show that the key ingredient driving these dynamics is not the original training loss, but rather the combination of a modified loss, which implicitly regularizes the velocity and probability currents, which cause oscillations in phase space.
arXiv Detail & Related papers (2021-07-19T20:18:57Z) - Entropic alternatives to initialization [0.0]
We analyze anisotropic, local entropic smoothenings in the language of statistical physics and information theory.
We comment some aspects related to the physics of renormalization and the spacetime structure of convolutional networks.
arXiv Detail & Related papers (2021-07-16T08:17:32Z) - Entanglement dynamics in Rule 54: Exact results and quasiparticle
picture [0.0]
We study the entanglement dynamics generated by quantum quenches in the quantum cellular automaton Rule $54$.
While in the case of von Neumann entropy we recover exactly the predictions of the quasiparticle picture, we find no physically meaningful quasiparticle description for other R'enyi entropies.
arXiv Detail & Related papers (2021-04-09T17:51:09Z) - Going beyond p-convolutions to learn grayscale morphological operators [64.38361575778237]
We present two new morphological layers based on the same principle as the p-convolutional layer.
In this work, we present two new morphological layers based on the same principle as the p-convolutional layer.
arXiv Detail & Related papers (2021-02-19T17:22:16Z) - Gradient Starvation: A Learning Proclivity in Neural Networks [97.02382916372594]
Gradient Starvation arises when cross-entropy loss is minimized by capturing only a subset of features relevant for the task.
This work provides a theoretical explanation for the emergence of such feature imbalance in neural networks.
arXiv Detail & Related papers (2020-11-18T18:52:08Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.