Tradeoffs of Diagonal Fisher Information Matrix Estimators
- URL: http://arxiv.org/abs/2402.05379v2
- Date: Wed, 3 Apr 2024 00:16:25 GMT
- Title: Tradeoffs of Diagonal Fisher Information Matrix Estimators
- Authors: Alexander Soen, Ke Sun,
- Abstract summary: Given its high computational cost, practitioners often use random estimators and evaluate only the diagonal entries.
We examine two such estimators, whose accuracy and sample complexity depend on their associated variances.
We derive bounds of the variances and instantiate them in regression and classification networks.
- Score: 53.35448232352667
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: The Fisher information matrix characterizes the local geometry in the parameter space of neural networks. It elucidates insightful theories and useful tools to understand and optimize neural networks. Given its high computational cost, practitioners often use random estimators and evaluate only the diagonal entries. We examine two such estimators, whose accuracy and sample complexity depend on their associated variances. We derive bounds of the variances and instantiate them in regression and classification networks. We navigate trade-offs of both estimators based on analytical and numerical studies. We find that the variance quantities depend on the non-linearity with respect to different parameter groups and should not be neglected when estimating the Fisher information.
Related papers
- Invariance Measures for Neural Networks [1.2845309023495566]
We propose measures to quantify the invariance of neural networks in terms of their internal representation.
The measures are efficient and interpretable, and can be applied to any neural network model.
arXiv Detail & Related papers (2023-10-26T13:59:39Z) - Fundamental limits of overparametrized shallow neural networks for
supervised learning [11.136777922498355]
We study a two-layer neural network trained from input-output pairs generated by a teacher network with matching architecture.
Our results come in the form of bounds relating i) the mutual information between training data and network weights, or ii) the Bayes-optimal generalization error.
arXiv Detail & Related papers (2023-07-11T08:30:50Z) - What Affects Learned Equivariance in Deep Image Recognition Models? [10.590129221143222]
We find evidence for a correlation between learned translation equivariance and validation accuracy on ImageNet.
Data augmentation, reduced model capacity and inductive bias in the form of convolutions induce higher learned equivariance in neural networks.
arXiv Detail & Related papers (2023-04-05T17:54:25Z) - Equivariance Discovery by Learned Parameter-Sharing [153.41877129746223]
We study how to discover interpretable equivariances from data.
Specifically, we formulate this discovery process as an optimization problem over a model's parameter-sharing schemes.
Also, we theoretically analyze the method for Gaussian data and provide a bound on the mean squared gap between the studied discovery scheme and the oracle scheme.
arXiv Detail & Related papers (2022-04-07T17:59:19Z) - On the Variance of the Fisher Information for Deep Learning [79.71410479830222]
The Fisher information matrix (FIM) has been applied to the realm of deep learning.
The exact FIM is either unavailable in closed form or too expensive to compute.
We investigate two such estimators based on two equivalent representations of the FIM.
arXiv Detail & Related papers (2021-07-09T04:46:50Z) - Estimating informativeness of samples with Smooth Unique Information [108.25192785062367]
We measure how much a sample informs the final weights and how much it informs the function computed by the weights.
We give efficient approximations of these quantities using a linearized network.
We apply these measures to several problems, such as dataset summarization.
arXiv Detail & Related papers (2021-01-17T10:29:29Z) - Learning Invariances in Neural Networks [51.20867785006147]
We show how to parameterize a distribution over augmentations and optimize the training loss simultaneously with respect to the network parameters and augmentation parameters.
We can recover the correct set and extent of invariances on image classification, regression, segmentation, and molecular property prediction from a large space of augmentations.
arXiv Detail & Related papers (2020-10-22T17:18:48Z) - Category-Learning with Context-Augmented Autoencoder [63.05016513788047]
Finding an interpretable non-redundant representation of real-world data is one of the key problems in Machine Learning.
We propose a novel method of using data augmentations when training autoencoders.
We train a Variational Autoencoder in such a way, that it makes transformation outcome predictable by auxiliary network.
arXiv Detail & Related papers (2020-10-10T14:04:44Z) - Statistical Guarantees for Regularized Neural Networks [4.254099382808598]
We develop a general statistical guarantee for estimators that consist of a least-squares term and a regularizer.
Our results establish a mathematical basis for regularized estimation of neural networks.
arXiv Detail & Related papers (2020-05-30T15:28:47Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.