Related papers: On the Variance of the Fisher Information for Deep Learning

On the Variance of the Fisher Information for Deep Learning

URL: http://arxiv.org/abs/2107.04205v1
Date: Fri, 9 Jul 2021 04:46:50 GMT
Title: On the Variance of the Fisher Information for Deep Learning
Authors: Alexander Soen and Ke Sun
Abstract summary: The Fisher information matrix (FIM) has been applied to the realm of deep learning. The exact FIM is either unavailable in closed form or too expensive to compute. We investigate two such estimators based on two equivalent representations of the FIM.
Score: 79.71410479830222
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: The Fisher information matrix (FIM) has been applied to the realm of deep learning. It is closely related to the loss landscape, the variance of the parameters, second order optimization, and deep learning theory. The exact FIM is either unavailable in closed form or too expensive to compute. In practice, it is almost always estimated based on empirical samples. We investigate two such estimators based on two equivalent representations of the FIM. They are both unbiased and consistent with respect to the underlying "true" FIM. Their estimation quality is characterized by their variance given in closed form. We bound their variances and analyze how the parametric structure of a deep neural network can impact the variance. We discuss the meaning of this variance measure and our bounds in the context of deep learning.

Related papers

Hessian-Free Laplace in Bayesian Deep Learning [44.16006844888796]
Hessian-free Laplace (HFL) approximation uses curvature of both the log posterior and network prediction to estimate its variance. We show that, under standard assumptions of LA in Bayesian deep learning, HFL targets the same variance as LA, and can be efficiently amortized in a pre-trained network.
arXiv Detail & Related papers (2024-03-15T20:47:39Z)
Trade-Offs of Diagonal Fisher Information Matrix Estimators [53.35448232352667]
The Fisher information matrix can be used to characterize the local geometry of the parameter space of neural networks. We examine two popular estimators whose accuracy and sample complexity depend on their associated variances. We derive bounds of the variances and instantiate them in neural networks for regression and classification.
arXiv Detail & Related papers (2024-02-08T03:29:10Z)
In What Ways Are Deep Neural Networks Invariant and How Should We Measure This? [5.757836174655293]
We introduce a family of invariance and equivariance metrics that allows us to quantify these properties in a way that disentangles them from other metrics such as loss or accuracy. We draw a range of conclusions about invariance and equivariance in deep learning models, ranging from whether initializing a model with pretrained weights has an effect on a trained model's invariance, to the extent to which invariance learned via training can generalize to out-of-distribution data.
arXiv Detail & Related papers (2022-10-07T18:43:21Z)
Equivariance Discovery by Learned Parameter-Sharing [153.41877129746223]
We study how to discover interpretable equivariances from data. Specifically, we formulate this discovery process as an optimization problem over a model's parameter-sharing schemes. Also, we theoretically analyze the method for Gaussian data and provide a bound on the mean squared gap between the studied discovery scheme and the oracle scheme.
arXiv Detail & Related papers (2022-04-07T17:59:19Z)
Towards an Understanding of Benign Overfitting in Neural Networks [104.2956323934544]
Modern machine learning models often employ a huge number of parameters and are typically optimized to have zero training loss. We examine how these benign overfitting phenomena occur in a two-layer neural network setting. We show that it is possible for the two-layer ReLU network interpolator to achieve a near minimax-optimal learning rate.
arXiv Detail & Related papers (2021-06-06T19:08:53Z)
On the Practicality of Differential Privacy in Federated Learning by Tuning Iteration Times [51.61278695776151]
Federated Learning (FL) is well known for its privacy protection when training machine learning models among distributed clients collaboratively. Recent studies have pointed out that the naive FL is susceptible to gradient leakage attacks. Differential Privacy (DP) emerges as a promising countermeasure to defend against gradient leakage attacks.
arXiv Detail & Related papers (2021-01-11T19:43:12Z)
A Novel Regression Loss for Non-Parametric Uncertainty Optimization [7.766663822644739]
Quantification of uncertainty is one of the most promising approaches to establish safe machine learning. One of the most commonly used approaches so far is Monte Carlo dropout, which is computationally cheap and easy to apply in practice. We propose a new objective, referred to as second-moment loss ( UCI), to address this issue.
arXiv Detail & Related papers (2021-01-07T19:12:06Z)
What causes the test error? Going beyond bias-variance via ANOVA [21.359033212191218]
Modern machine learning methods are often overparametrized, allowing adaptation to the data at a fine level. Recent work aimed to understand in greater depth why overparametrization is helpful for generalization. We propose using the analysis of variance (ANOVA) to decompose the variance in the test error in a symmetric way.
arXiv Detail & Related papers (2020-10-11T05:21:13Z)
DEMI: Discriminative Estimator of Mutual Information [5.248805627195347]
Estimating mutual information between continuous random variables is often intractable and challenging for high-dimensional data. Recent progress has leveraged neural networks to optimize variational lower bounds on mutual information. Our approach is based on training a classifier that provides the probability that a data sample pair is drawn from the joint distribution.
arXiv Detail & Related papers (2020-10-05T04:19:27Z)
Neural Methods for Point-wise Dependency Estimation [129.93860669802046]
We focus on estimating point-wise dependency (PD), which quantitatively measures how likely two outcomes co-occur. We demonstrate the effectiveness of our approaches in 1) MI estimation, 2) self-supervised representation learning, and 3) cross-modal retrieval task.
arXiv Detail & Related papers (2020-06-09T23:26:15Z)

This list is automatically generated from the titles and abstracts of the papers in this site.