Related papers: Trade-Offs of Diagonal Fisher Information Matrix Estimators

Trade-Offs of Diagonal Fisher Information Matrix Estimators

URL: http://arxiv.org/abs/2402.05379v3
Date: Wed, 30 Oct 2024 09:29:10 GMT
Title: Trade-Offs of Diagonal Fisher Information Matrix Estimators
Authors: Alexander Soen, Ke Sun,
Abstract summary: The Fisher information matrix can be used to characterize the local geometry of the parameter space of neural networks. We examine two popular estimators whose accuracy and sample complexity depend on their associated variances. We derive bounds of the variances and instantiate them in neural networks for regression and classification.
Score: 53.35448232352667
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: The Fisher information matrix can be used to characterize the local geometry of the parameter space of neural networks. It elucidates insightful theories and useful tools to understand and optimize neural networks. Given its high computational cost, practitioners often use random estimators and evaluate only the diagonal entries. We examine two popular estimators whose accuracy and sample complexity depend on their associated variances. We derive bounds of the variances and instantiate them in neural networks for regression and classification. We navigate trade-offs for both estimators based on analytical and numerical studies. We find that the variance quantities depend on the non-linearity wrt different parameter groups and should not be neglected when estimating the Fisher information.

Related papers

Diagonal Symmetrization of Neural Network Solvers for the Many-Electron Schrödinger Equation [11.202098800341096]
We study different ways of incorporating diagonal invariance in neural network ans"atze trained via variational Monte Carlo methods. We show that, contrary to standard ML setups, in-training symmetrization destabilizes training and can lead to worse performance. Our theoretical and numerical results indicate that this unexpected behavior may arise from a unique computational-statistical tradeoff not found in standard ML analyses of symmetrization.
arXiv Detail & Related papers (2025-02-07T20:37:25Z)
Symmetry Discovery for Different Data Types [52.2614860099811]
Equivariant neural networks incorporate symmetries into their architecture, achieving higher generalization performance. We propose LieSD, a method for discovering symmetries via trained neural networks which approximate the input-output mappings of the tasks. We validate the performance of LieSD on tasks with symmetries such as the two-body problem, the moment of inertia matrix prediction, and top quark tagging.
arXiv Detail & Related papers (2024-10-13T13:39:39Z)
Invariance Measures for Neural Networks [1.2845309023495566]
We propose measures to quantify the invariance of neural networks in terms of their internal representation. The measures are efficient and interpretable, and can be applied to any neural network model.
arXiv Detail & Related papers (2023-10-26T13:59:39Z)
What Affects Learned Equivariance in Deep Image Recognition Models? [10.590129221143222]
We find evidence for a correlation between learned translation equivariance and validation accuracy on ImageNet. Data augmentation, reduced model capacity and inductive bias in the form of convolutions induce higher learned equivariance in neural networks.
arXiv Detail & Related papers (2023-04-05T17:54:25Z)
Learning to Learn with Generative Models of Neural Network Checkpoints [71.06722933442956]
We construct a dataset of neural network checkpoints and train a generative model on the parameters. We find that our approach successfully generates parameters for a wide range of loss prompts. We apply our method to different neural network architectures and tasks in supervised and reinforcement learning.
arXiv Detail & Related papers (2022-09-26T17:59:58Z)
Bagged Polynomial Regression and Neural Networks [0.0]
Series and dataset regression are able to approximate the same function classes as neural networks. textitbagged regression (BPR) is an attractive alternative to neural networks. BPR performs as well as neural networks in crop classification using satellite data.
arXiv Detail & Related papers (2022-05-17T19:55:56Z)
On the Variance of the Fisher Information for Deep Learning [79.71410479830222]
The Fisher information matrix (FIM) has been applied to the realm of deep learning. The exact FIM is either unavailable in closed form or too expensive to compute. We investigate two such estimators based on two equivalent representations of the FIM.
arXiv Detail & Related papers (2021-07-09T04:46:50Z)
Estimating informativeness of samples with Smooth Unique Information [108.25192785062367]
We measure how much a sample informs the final weights and how much it informs the function computed by the weights. We give efficient approximations of these quantities using a linearized network. We apply these measures to several problems, such as dataset summarization.
arXiv Detail & Related papers (2021-01-17T10:29:29Z)
Learning Invariances in Neural Networks [51.20867785006147]
We show how to parameterize a distribution over augmentations and optimize the training loss simultaneously with respect to the network parameters and augmentation parameters. We can recover the correct set and extent of invariances on image classification, regression, segmentation, and molecular property prediction from a large space of augmentations.
arXiv Detail & Related papers (2020-10-22T17:18:48Z)
Statistical Guarantees for Regularized Neural Networks [4.254099382808598]
We develop a general statistical guarantee for estimators that consist of a least-squares term and a regularizer. Our results establish a mathematical basis for regularized estimation of neural networks.
arXiv Detail & Related papers (2020-05-30T15:28:47Z)

This list is automatically generated from the titles and abstracts of the papers in this site.