Hessian Eigenvectors and Principal Component Analysis of Neural Network
Weight Matrices
- URL: http://arxiv.org/abs/2311.00452v1
- Date: Wed, 1 Nov 2023 11:38:31 GMT
- Title: Hessian Eigenvectors and Principal Component Analysis of Neural Network
Weight Matrices
- Authors: David Haink
- Abstract summary: This study delves into the intricate dynamics of trained deep neural networks and their relationships with network parameters.
We unveil a correlation between Hessian eigenvectors and network weights.
This relationship, hinging on the magnitude of eigenvalues, allows us to discern parameter directions within the network.
- Score: 0.0
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: This study delves into the intricate dynamics of trained deep neural networks
and their relationships with network parameters. Trained networks predominantly
continue training in a single direction, known as the drift mode. This drift
mode can be explained by the quadratic potential model of the loss function,
suggesting a slow exponential decay towards the potential minima. We unveil a
correlation between Hessian eigenvectors and network weights. This
relationship, hinging on the magnitude of eigenvalues, allows us to discern
parameter directions within the network. Notably, the significance of these
directions relies on two defining attributes: the curvature of their potential
wells (indicated by the magnitude of Hessian eigenvalues) and their alignment
with the weight vectors. Our exploration extends to the decomposition of weight
matrices through singular value decomposition. This approach proves practical
in identifying critical directions within the Hessian, considering both their
magnitude and curvature. Furthermore, our examination showcases the
applicability of principal component analysis in approximating the Hessian,
with update parameters emerging as a superior choice over weights for this
purpose. Remarkably, our findings unveil a similarity between the largest
Hessian eigenvalues of individual layers and the entire network. Notably,
higher eigenvalues are concentrated more in deeper layers. Leveraging these
insights, we venture into addressing catastrophic forgetting, a challenge of
neural networks when learning new tasks while retaining knowledge from previous
ones. By applying our discoveries, we formulate an effective strategy to
mitigate catastrophic forgetting, offering a possible solution that can be
applied to networks of varying scales, including larger architectures.
Related papers
- Theoretical characterisation of the Gauss-Newton conditioning in Neural Networks [5.851101657703105]
We take a first step towards theoretically characterizing the conditioning of the Gauss-Newton (GN) matrix in neural networks.
We establish tight bounds on the condition number of the GN in deep linear networks of arbitrary depth and width.
We expand the analysis to further architectural components, such as residual connections and convolutional layers.
arXiv Detail & Related papers (2024-11-04T14:56:48Z) - Asymptotics of Learning with Deep Structured (Random) Features [9.366617422860543]
For a large class of feature maps we provide a tight characterisation of the test error associated with learning the readout layer.
In some cases our results can capture feature maps learned by deep, finite-width neural networks trained under gradient descent.
arXiv Detail & Related papers (2024-02-21T18:35:27Z) - Optimization-Based Separations for Neural Networks [57.875347246373956]
We show that gradient descent can efficiently learn ball indicator functions using a depth 2 neural network with two layers of sigmoidal activations.
This is the first optimization-based separation result where the approximation benefits of the stronger architecture provably manifest in practice.
arXiv Detail & Related papers (2021-12-04T18:07:47Z) - Training Integrable Parameterizations of Deep Neural Networks in the
Infinite-Width Limit [0.0]
Large-width dynamics has emerged as a fruitful viewpoint and led to practical insights on real-world deep networks.
For two-layer neural networks, it has been understood that the nature of the trained model radically changes depending on the scale of the initial random weights.
We propose various methods to avoid this trivial behavior and analyze in detail the resulting dynamics.
arXiv Detail & Related papers (2021-10-29T07:53:35Z) - Analytic Insights into Structure and Rank of Neural Network Hessian Maps [32.90143789616052]
Hessian of a neural network captures parameter interactions through second-order derivatives of the loss.
We develop theoretical tools to analyze the range of the Hessian map, providing us with a precise understanding of its rank deficiency.
This yields exact formulas and tight upper bounds for the Hessian rank of deep linear networks.
arXiv Detail & Related papers (2021-06-30T17:29:58Z) - A neural anisotropic view of underspecification in deep learning [60.119023683371736]
We show that the way neural networks handle the underspecification of problems is highly dependent on the data representation.
Our results highlight that understanding the architectural inductive bias in deep learning is fundamental to address the fairness, robustness, and generalization of these systems.
arXiv Detail & Related papers (2021-04-29T14:31:09Z) - Learning Connectivity of Neural Networks from a Topological Perspective [80.35103711638548]
We propose a topological perspective to represent a network into a complete graph for analysis.
By assigning learnable parameters to the edges which reflect the magnitude of connections, the learning process can be performed in a differentiable manner.
This learning process is compatible with existing networks and owns adaptability to larger search spaces and different tasks.
arXiv Detail & Related papers (2020-08-19T04:53:31Z) - Revisiting Initialization of Neural Networks [72.24615341588846]
We propose a rigorous estimation of the global curvature of weights across layers by approximating and controlling the norm of their Hessian matrix.
Our experiments on Word2Vec and the MNIST/CIFAR image classification tasks confirm that tracking the Hessian norm is a useful diagnostic tool.
arXiv Detail & Related papers (2020-04-20T18:12:56Z) - Eigendecomposition-Free Training of Deep Networks for Linear
Least-Square Problems [107.3868459697569]
We introduce an eigendecomposition-free approach to training a deep network.
We show that our approach is much more robust than explicit differentiation of the eigendecomposition.
Our method has better convergence properties and yields state-of-the-art results.
arXiv Detail & Related papers (2020-04-15T04:29:34Z) - Understanding Generalization in Deep Learning via Tensor Methods [53.808840694241]
We advance the understanding of the relations between the network's architecture and its generalizability from the compression perspective.
We propose a series of intuitive, data-dependent and easily-measurable properties that tightly characterize the compressibility and generalizability of neural networks.
arXiv Detail & Related papers (2020-01-14T22:26:57Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.