Related papers: The Spectrum of Fisher Information of Deep Networks Achieving Dynamical Isometry

The Spectrum of Fisher Information of Deep Networks Achieving Dynamical Isometry

URL: http://arxiv.org/abs/2006.07814v4
Date: Mon, 29 Mar 2021 19:08:02 GMT
Title: The Spectrum of Fisher Information of Deep Networks Achieving Dynamical Isometry
Authors: Tomohiro Hayase, Ryo Karakida
Abstract summary: The Fisher information matrix (FIM) is fundamental to understanding the trainability of deep neural nets (DNNs) We investigate the spectral distribution of the conditional FIM, which is the FIM given a single sample, by focusing on fully-connected networks. We find that the parameter space's local metric linearly depends on the depth even under the dynamical isometry.
Score: 9.289846887298852
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: The Fisher information matrix (FIM) is fundamental to understanding the trainability of deep neural nets (DNN), since it describes the parameter space's local metric. We investigate the spectral distribution of the conditional FIM, which is the FIM given a single sample, by focusing on fully-connected networks achieving dynamical isometry. Then, while dynamical isometry is known to keep specific backpropagated signals independent of the depth, we find that the parameter space's local metric linearly depends on the depth even under the dynamical isometry. More precisely, we reveal that the conditional FIM's spectrum concentrates around the maximum and the value grows linearly as the depth increases. To examine the spectrum, considering random initialization and the wide limit, we construct an algebraic methodology based on the free probability theory. As a byproduct, we provide an analysis of the solvable spectral distribution in two-hidden-layer cases. Lastly, experimental results verify that the appropriate learning rate for the online training of DNNs is in inverse proportional to depth, which is determined by the conditional FIM's spectrum.

Related papers

Learning over von Mises-Fisher Distributions via a Wasserstein-like Geometry [0.0]
We introduce a geometry-aware distance metric for the family of von Mises-Fisher (vMF) distributions. Motivated by the theory of optimal transport, we propose a Wasserstein-like distance that decomposes the discrepancy between two vMF distributions into two interpretable components.
arXiv Detail & Related papers (2025-04-19T03:38:15Z)
Interpretable Measurement of CNN Deep Feature Density using Copula and the Generalized Characteristic Function [1.9797215742507548]
We present a novel empirical approach toward measuring the Probability Density Function (PDF) of the deep features of Convolutional Neural Networks (CNNs) We find that, surprisingly, the one-dimensional marginals of non-negative deep CNN features after major blocks are not well approximated by a Gaussian distribution. We observe that deep features become increasingly independent with increasing network depth within their typical ranges.
arXiv Detail & Related papers (2024-11-07T21:04:58Z)
Rethinking Clustered Federated Learning in NOMA Enhanced Wireless Networks [60.09912912343705]
This study explores the benefits of integrating the novel clustered federated learning (CFL) approach with non-independent and identically distributed (non-IID) datasets. A detailed theoretical analysis of the generalization gap that measures the degree of non-IID in the data distribution is presented. Solutions to address the challenges posed by non-IID conditions are proposed with the analysis of the properties.
arXiv Detail & Related papers (2024-03-05T17:49:09Z)
Assessing Neural Network Representations During Training Using Noise-Resilient Diffusion Spectral Entropy [55.014926694758195]
Entropy and mutual information in neural networks provide rich information on the learning process. We leverage data geometry to access the underlying manifold and reliably compute these information-theoretic measures. We show that they form noise-resistant measures of intrinsic dimensionality and relationship strength in high-dimensional simulated data.
arXiv Detail & Related papers (2023-12-04T01:32:42Z)
Learning Radio Environments by Differentiable Ray Tracing [56.40113938833999]
We introduce a novel gradient-based calibration method, complemented by differentiable parametrizations of material properties, scattering and antenna patterns. We have validated our method using both synthetic data and real-world indoor channel measurements, employing a distributed multiple-input multiple-output (MIMO) channel sounder.
arXiv Detail & Related papers (2023-11-30T13:50:21Z)
Neural FIM for learning Fisher Information Metrics from point cloud data [71.07939200676199]
We propose neural FIM, a method for computing the Fisher information metric (FIM) from point cloud data. We demonstrate its utility in selecting parameters for the PHATE visualization method as well as its ability to obtain information pertaining to local volume illuminating branching points and cluster centers embeddings of a toy dataset and two single-cell datasets of IPSC reprogramming and PBMCs (immune cells)
arXiv Detail & Related papers (2023-06-01T17:36:13Z)
Bayesian Interpolation with Deep Linear Networks [92.1721532941863]
Characterizing how neural network depth, width, and dataset size jointly impact model quality is a central problem in deep learning theory. We show that linear networks make provably optimal predictions at infinite depth. We also show that with data-agnostic priors, Bayesian model evidence in wide linear networks is maximized at infinite depth.
arXiv Detail & Related papers (2022-12-29T20:57:46Z)
A Dynamics Theory of Implicit Regularization in Deep Low-Rank Matrix Factorization [21.64166573203593]
Implicit regularization is an important way to interpret neural networks. Recent theory starts to explain implicit regularization with the model of deep matrix factorization (DMF)
arXiv Detail & Related papers (2022-12-29T02:11:19Z)
Deep Linear Networks for Matrix Completion -- An Infinite Depth Limit [10.64241024049424]
The deep linear network (DLN) is a model for implicit regularization in gradient based optimization of overparametrized learning architectures. We investigate the link between the geometric geometry and the trainings for matrix completion with rigorous analysis and numerics. We propose that implicit regularization is a result of bias towards high state space volume.
arXiv Detail & Related papers (2022-10-22T17:03:10Z)
Deep Sufficient Representation Learning via Mutual Information [2.9832792722677506]
We propose a mutual information-based sufficient representation learning (MSRL) approach. MSRL learns a sufficient representation with the maximum mutual information with the response and a user-selected distribution. We evaluate the performance of MSRL via extensive numerical experiments and real data analysis.
arXiv Detail & Related papers (2022-07-21T22:13:21Z)
Implicit Data-Driven Regularization in Deep Neural Networks under SGD [0.0]
spectral analysis of large random matrices involved in a trained deep neural network (DNN) We find that these spectra can be classified into three main types: Marvcenko-Pastur spectrum (MP), Marvcenko-Pastur spectrum with few bleeding outliers (MPB), and Heavy tailed spectrum (HT)
arXiv Detail & Related papers (2021-11-26T06:36:16Z)
On the Variance of the Fisher Information for Deep Learning [79.71410479830222]
The Fisher information matrix (FIM) has been applied to the realm of deep learning. The exact FIM is either unavailable in closed form or too expensive to compute. We investigate two such estimators based on two equivalent representations of the FIM.
arXiv Detail & Related papers (2021-07-09T04:46:50Z)

This list is automatically generated from the titles and abstracts of the papers in this site.