Related papers: Information Geometry of Dropout Training

Information Geometry of Dropout Training

URL: http://arxiv.org/abs/2206.10936v1
Date: Wed, 22 Jun 2022 09:27:41 GMT
Title: Information Geometry of Dropout Training
Authors: Masanari Kimura, Hideitsu Hino
Abstract summary: Dropout is one of the most popular regularization techniques in neural network training. In this paper, several properties of dropout are discussed in a unified manner from the viewpoint of information geometry.
Score: 5.990174495635326
License: http://creativecommons.org/licenses/by/4.0/
Abstract: Dropout is one of the most popular regularization techniques in neural network training. Because of its power and simplicity of idea, dropout has been analyzed extensively and many variants have been proposed. In this paper, several properties of dropout are discussed in a unified manner from the viewpoint of information geometry. We showed that dropout flattens the model manifold and that their regularization performance depends on the amount of the curvature. Then, we showed that dropout essentially corresponds to a regularization that depends on the Fisher information, and support this result from numerical experiments. Such a theoretical analysis of the technique from a different perspective is expected to greatly assist in the understanding of neural networks, which are still in their infancy.

Related papers

Analytic theory of dropout regularization [1.243080988483032]
Dropout is a regularization technique widely used in training artificial neural networks.<n>We analytically study dropout in two-layer neural networks trained with online gradient descent.
arXiv Detail & Related papers (2025-05-12T17:45:02Z)
Deep Learning Through A Telescoping Lens: A Simple Model Provides Empirical Insights On Grokking, Gradient Boosting & Beyond [61.18736646013446]
In pursuit of a deeper understanding of its surprising behaviors, we investigate the utility of a simple yet accurate model of a trained neural network. Across three case studies, we illustrate how it can be applied to derive new empirical insights on a diverse range of prominent phenomena.
arXiv Detail & Related papers (2024-10-31T22:54:34Z)
Trade-Offs of Diagonal Fisher Information Matrix Estimators [53.35448232352667]
The Fisher information matrix can be used to characterize the local geometry of the parameter space of neural networks. We examine two popular estimators whose accuracy and sample complexity depend on their associated variances. We derive bounds of the variances and instantiate them in neural networks for regression and classification.
arXiv Detail & Related papers (2024-02-08T03:29:10Z)
On the Dynamics Under the Unhinged Loss and Beyond [104.49565602940699]
We introduce the unhinged loss, a concise loss function, that offers more mathematical opportunities to analyze closed-form dynamics. The unhinged loss allows for considering more practical techniques, such as time-vary learning rates and feature normalization.
arXiv Detail & Related papers (2023-12-13T02:11:07Z)
Implicit regularization of dropout [3.42658286826597]
It is important to understand how dropout, a popular regularization method, aids in achieving a good generalization solution during neural network training. In this work, we present a theoretical derivation of an implicit regularization of dropout, which is validated by a series of experiments. We experimentally find that the training with dropout leads to the neural network with a flatter minimum compared with standard gradient descent training.
arXiv Detail & Related papers (2022-07-13T04:09:14Z)
Mean-field Analysis of Piecewise Linear Solutions for Wide ReLU Networks [83.58049517083138]
We consider a two-layer ReLU network trained via gradient descent. We show that SGD is biased towards a simple solution. We also provide empirical evidence that knots at locations distinct from the data points might occur.
arXiv Detail & Related papers (2021-11-03T15:14:20Z)
A neural anisotropic view of underspecification in deep learning [60.119023683371736]
We show that the way neural networks handle the underspecification of problems is highly dependent on the data representation. Our results highlight that understanding the architectural inductive bias in deep learning is fundamental to address the fairness, robustness, and generalization of these systems.
arXiv Detail & Related papers (2021-04-29T14:31:09Z)
Contextual Dropout: An Efficient Sample-Dependent Dropout Module [60.63525456640462]
Dropout has been demonstrated as a simple and effective module to regularize the training process of deep neural networks. We propose contextual dropout with an efficient structural design as a simple and scalable sample-dependent dropout module. Our experimental results show that the proposed method outperforms baseline methods in terms of both accuracy and quality of uncertainty estimation.
arXiv Detail & Related papers (2021-03-06T19:30:32Z)
Provable Benefits of Overparameterization in Model Compression: From Double Descent to Pruning Neural Networks [38.153825455980645]
Recent empirical evidence indicates that the practice of overization not only benefits training large models, but also assists - perhaps counterintuitively - building lightweight models. This paper sheds light on these empirical findings by theoretically characterizing the high-dimensional toolsets of model pruning. We analytically identify regimes in which, even if the location of the most informative features is known, we are better off fitting a large model and then pruning.
arXiv Detail & Related papers (2020-12-16T05:13:30Z)
Fiedler Regularization: Learning Neural Networks with Graph Sparsity [6.09170287691728]
We introduce a novel regularization approach for deep learning that incorporates and respects the underlying graphical structure of the neural network. We propose to use the Fiedler value of the neural network's underlying graph as a tool for regularization.
arXiv Detail & Related papers (2020-03-02T16:19:33Z)
The Implicit and Explicit Regularization Effects of Dropout [43.431343291010734]
Dropout is a widely-used regularization technique, often required to obtain state-of-the-art for a number of architectures. This work demonstrates that dropout introduces two distinct but entangled regularization effects.
arXiv Detail & Related papers (2020-02-28T18:31:17Z)

This list is automatically generated from the titles and abstracts of the papers in this site.