Analytic Insights into Structure and Rank of Neural Network Hessian Maps
- URL: http://arxiv.org/abs/2106.16225v2
- Date: Thu, 1 Jul 2021 17:57:50 GMT
- Title: Analytic Insights into Structure and Rank of Neural Network Hessian Maps
- Authors: Sidak Pal Singh, Gregor Bachmann, Thomas Hofmann
- Abstract summary: Hessian of a neural network captures parameter interactions through second-order derivatives of the loss.
We develop theoretical tools to analyze the range of the Hessian map, providing us with a precise understanding of its rank deficiency.
This yields exact formulas and tight upper bounds for the Hessian rank of deep linear networks.
- Score: 32.90143789616052
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: The Hessian of a neural network captures parameter interactions through
second-order derivatives of the loss. It is a fundamental object of study,
closely tied to various problems in deep learning, including model design,
optimization, and generalization. Most prior work has been empirical, typically
focusing on low-rank approximations and heuristics that are blind to the
network structure. In contrast, we develop theoretical tools to analyze the
range of the Hessian map, providing us with a precise understanding of its rank
deficiency as well as the structural reasons behind it. This yields exact
formulas and tight upper bounds for the Hessian rank of deep linear networks,
allowing for an elegant interpretation in terms of rank deficiency. Moreover,
we demonstrate that our bounds remain faithful as an estimate of the numerical
Hessian rank, for a larger class of models such as rectified and hyperbolic
tangent networks. Further, we also investigate the implications of model
architecture (e.g.~width, depth, bias) on the rank deficiency. Overall, our
work provides novel insights into the source and extent of redundancy in
overparameterized networks.
Related papers
- Neural Scaling Laws of Deep ReLU and Deep Operator Network: A Theoretical Study [8.183509993010983]
We study the neural scaling laws for deep operator networks using the Chen and Chen style architecture.
We quantify the neural scaling laws by analyzing its approximation and generalization errors.
Our results offer a partial explanation of the neural scaling laws in operator learning and provide a theoretical foundation for their applications.
arXiv Detail & Related papers (2024-10-01T03:06:55Z) - Operator Learning Meets Numerical Analysis: Improving Neural Networks
through Iterative Methods [2.226971382808806]
We develop a theoretical framework grounded in iterative methods for operator equations.
We demonstrate that popular architectures, such as diffusion models and AlphaFold, inherently employ iterative operator learning.
Our work aims to enhance the understanding of deep learning by merging insights from numerical analysis.
arXiv Detail & Related papers (2023-10-02T20:25:36Z) - Approximation Power of Deep Neural Networks: an explanatory mathematical
survey [0.0]
The goal of this survey is to present an explanatory review of the approximation properties of deep neural networks.
We aim at understanding how and why deep neural networks outperform other classical linear and nonlinear approximation methods.
arXiv Detail & Related papers (2022-07-19T18:47:44Z) - Rank Diminishing in Deep Neural Networks [71.03777954670323]
Rank of neural networks measures information flowing across layers.
It is an instance of a key structural condition that applies across broad domains of machine learning.
For neural networks, however, the intrinsic mechanism that yields low-rank structures remains vague and unclear.
arXiv Detail & Related papers (2022-06-13T12:03:32Z) - Functional Network: A Novel Framework for Interpretability of Deep
Neural Networks [2.641939670320645]
We propose a novel framework for interpretability of deep neural networks, that is, the functional network.
In our experiments, the mechanisms of regularization methods, namely, batch normalization and dropout, are revealed.
arXiv Detail & Related papers (2022-05-24T01:17:36Z) - Deep Architecture Connectivity Matters for Its Convergence: A
Fine-Grained Analysis [94.64007376939735]
We theoretically characterize the impact of connectivity patterns on the convergence of deep neural networks (DNNs) under gradient descent training.
We show that by a simple filtration on "unpromising" connectivity patterns, we can trim down the number of models to evaluate.
arXiv Detail & Related papers (2022-05-11T17:43:54Z) - A neural anisotropic view of underspecification in deep learning [60.119023683371736]
We show that the way neural networks handle the underspecification of problems is highly dependent on the data representation.
Our results highlight that understanding the architectural inductive bias in deep learning is fundamental to address the fairness, robustness, and generalization of these systems.
arXiv Detail & Related papers (2021-04-29T14:31:09Z) - Topological obstructions in neural networks learning [67.8848058842671]
We study global properties of the loss gradient function flow.
We use topological data analysis of the loss function and its Morse complex to relate local behavior along gradient trajectories with global properties of the loss surface.
arXiv Detail & Related papers (2020-12-31T18:53:25Z) - Learning Connectivity of Neural Networks from a Topological Perspective [80.35103711638548]
We propose a topological perspective to represent a network into a complete graph for analysis.
By assigning learnable parameters to the edges which reflect the magnitude of connections, the learning process can be performed in a differentiable manner.
This learning process is compatible with existing networks and owns adaptability to larger search spaces and different tasks.
arXiv Detail & Related papers (2020-08-19T04:53:31Z) - Towards Deeper Graph Neural Networks [63.46470695525957]
Graph convolutions perform neighborhood aggregation and represent one of the most important graph operations.
Several recent studies attribute this performance deterioration to the over-smoothing issue.
We propose Deep Adaptive Graph Neural Network (DAGNN) to adaptively incorporate information from large receptive fields.
arXiv Detail & Related papers (2020-07-18T01:11:14Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.