Analysis and Comparison of Two-Level KFAC Methods for Training Deep
Neural Networks
- URL: http://arxiv.org/abs/2303.18083v2
- Date: Mon, 3 Apr 2023 07:41:26 GMT
- Title: Analysis and Comparison of Two-Level KFAC Methods for Training Deep
Neural Networks
- Authors: Abdoulaye Koroko, Ani Anciaux-Sedrakian, Ibtihel Ben Gharbia,
Val\'erie Gar\`es, Mounir Haddou, Quang Huy Tran
- Abstract summary: We investigate the interest of restoring some low-frequency interactions between layers by means of two-level methods.
Inspired from domain decomposition, several two-level corrections to KFAC using different coarse spaces are proposed and assessed.
- Score: 0.0
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: As a second-order method, the Natural Gradient Descent (NGD) has the ability
to accelerate training of neural networks. However, due to the prohibitive
computational and memory costs of computing and inverting the Fisher
Information Matrix (FIM), efficient approximations are necessary to make NGD
scalable to Deep Neural Networks (DNNs). Many such approximations have been
attempted. The most sophisticated of these is KFAC, which approximates the FIM
as a block-diagonal matrix, where each block corresponds to a layer of the
neural network. By doing so, KFAC ignores the interactions between different
layers. In this work, we investigate the interest of restoring some
low-frequency interactions between the layers by means of two-level methods.
Inspired from domain decomposition, several two-level corrections to KFAC using
different coarse spaces are proposed and assessed. The obtained results show
that incorporating the layer interactions in this fashion does not really
improve the performance of KFAC. This suggests that it is safe to discard the
off-diagonal blocks of the FIM, since the block-diagonal approach is
sufficiently robust, accurate and economical in computation time.
Related papers
- Kronecker-Factored Approximate Curvature for Physics-Informed Neural Networks [3.7308074617637588]
We propose Kronecker-factored approximate curvature (KFAC) for PINN losses that greatly reduces the computational cost and allows scaling to much larger networks.
We find that our KFAC-based gradients are competitive with expensive second-order methods on small problems, scale more favorably to higher-dimensional neural networks and PDEs, and consistently outperform first-order methods and LBFGS.
arXiv Detail & Related papers (2024-05-24T14:36:02Z) - Kronecker-Factored Approximate Curvature for Modern Neural Network
Architectures [85.76673783330334]
Two different settings of linear weight-sharing layers motivate two flavours of Kronecker-Factored Approximate Curvature (K-FAC)
We show they are exact for deep linear networks with weight-sharing in their respective setting.
We observe little difference between these two K-FAC variations when using them to train both a graph neural network and a vision transformer.
arXiv Detail & Related papers (2023-11-01T16:37:00Z) - Implicit Stochastic Gradient Descent for Training Physics-informed
Neural Networks [51.92362217307946]
Physics-informed neural networks (PINNs) have effectively been demonstrated in solving forward and inverse differential equation problems.
PINNs are trapped in training failures when the target functions to be approximated exhibit high-frequency or multi-scale features.
In this paper, we propose to employ implicit gradient descent (ISGD) method to train PINNs for improving the stability of training process.
arXiv Detail & Related papers (2023-03-03T08:17:47Z) - Zonotope Domains for Lagrangian Neural Network Verification [102.13346781220383]
We decompose the problem of verifying a deep neural network into the verification of many 2-layer neural networks.
Our technique yields bounds that improve upon both linear programming and Lagrangian-based verification techniques.
arXiv Detail & Related papers (2022-10-14T19:31:39Z) - A Mini-Block Natural Gradient Method for Deep Neural Networks [12.48022619079224]
We propose and analyze the convergence of an approximate natural gradient method, mini-block Fisher (MBF)
Our novel approach utilizes the parallelism of generalization to efficiently perform on the large number of matrices in each layer.
arXiv Detail & Related papers (2022-02-08T20:01:48Z) - Efficient Approximations of the Fisher Matrix in Neural Networks using
Kronecker Product Singular Value Decomposition [0.0]
It is shown that natural gradient descent can minimize the objective function more efficiently than ordinary gradient descent based methods.
The bottleneck of this approach for training deep neural networks lies in the prohibitive cost of solving a large dense linear system corresponding to the Fisher Information Matrix (FIM) at each iteration.
This has motivated various approximations of either the exact FIM or the empirical one.
The most sophisticated of these is KFAC, which involves a Kronecker-factored block diagonal approximation of the FIM.
With only a slight additional cost, a few improvements of KFAC from the standpoint of accuracy are proposed
arXiv Detail & Related papers (2022-01-25T12:56:17Z) - Learning Frequency Domain Approximation for Binary Neural Networks [68.79904499480025]
We propose to estimate the gradient of sign function in the Fourier frequency domain using the combination of sine functions for training BNNs.
The experiments on several benchmark datasets and neural architectures illustrate that the binary network learned using our method achieves the state-of-the-art accuracy.
arXiv Detail & Related papers (2021-03-01T08:25:26Z) - LocalDrop: A Hybrid Regularization for Deep Neural Networks [98.30782118441158]
We propose a new approach for the regularization of neural networks by the local Rademacher complexity called LocalDrop.
A new regularization function for both fully-connected networks (FCNs) and convolutional neural networks (CNNs) has been developed based on the proposed upper bound of the local Rademacher complexity.
arXiv Detail & Related papers (2021-03-01T03:10:11Z) - Kronecker-factored Quasi-Newton Methods for Convolutional Neural
Networks [10.175972095073282]
KF-QN-CNN is a new quasi-factored training convolutional neural networks (CNNs)
KF-QN-CNN consistently exhibited superior performance in all of our tests.
arXiv Detail & Related papers (2021-02-12T19:40:34Z) - A Trace-restricted Kronecker-Factored Approximation to Natural Gradient [32.41025119083869]
We propose a new approximation to the Fisher information matrix called Trace-restricted Kronecker-factored Approximate Curvature (TKFAC)
Experiments show that our method has better performance compared with several state-of-the-art algorithms on some deep network architectures.
arXiv Detail & Related papers (2020-11-21T07:47:14Z) - Solving Sparse Linear Inverse Problems in Communication Systems: A Deep
Learning Approach With Adaptive Depth [51.40441097625201]
We propose an end-to-end trainable deep learning architecture for sparse signal recovery problems.
The proposed method learns how many layers to execute to emit an output, and the network depth is dynamically adjusted for each task in the inference phase.
arXiv Detail & Related papers (2020-10-29T06:32:53Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.