The Hessian perspective into the Nature of Convolutional Neural Networks
- URL: http://arxiv.org/abs/2305.09088v1
- Date: Tue, 16 May 2023 01:15:00 GMT
- Title: The Hessian perspective into the Nature of Convolutional Neural Networks
- Authors: Sidak Pal Singh, Thomas Hofmann, Bernhard Sch\"olkopf
- Abstract summary: We develop a framework relying on Toeplitz representation of CNNs, and then utilize it to reveal the Hessian structure and, in particular, its rank.
Overall, our work generalizes and establishes the key insight that, even in CNNs, the Hessian rank grows as the square root of the number of parameters.
- Score: 32.7270996241955
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: While Convolutional Neural Networks (CNNs) have long been investigated and
applied, as well as theorized, we aim to provide a slightly different
perspective into their nature -- through the perspective of their Hessian maps.
The reason is that the loss Hessian captures the pairwise interaction of
parameters and therefore forms a natural ground to probe how the architectural
aspects of CNN get manifested in its structure and properties. We develop a
framework relying on Toeplitz representation of CNNs, and then utilize it to
reveal the Hessian structure and, in particular, its rank. We prove tight upper
bounds (with linear activations), which closely follow the empirical trend of
the Hessian rank and hold in practice in more general settings. Overall, our
work generalizes and establishes the key insight that, even in CNNs, the
Hessian rank grows as the square root of the number of parameters.
Related papers
- Emergence of Globally Attracting Fixed Points in Deep Neural Networks With Nonlinear Activations [24.052411316664017]
We introduce a theoretical framework for the evolution of the kernel sequence, which measures the similarity between the hidden representation for two different inputs.
For nonlinear activations, the kernel sequence converges globally to a unique fixed point, which can correspond to similar representations depending on the activation and network architecture.
This work provides new insights into the implicit biases of deep neural networks and how architectural choices influence the evolution of representations across layers.
arXiv Detail & Related papers (2024-10-26T07:10:47Z) - Joint Diffusion Processes as an Inductive Bias in Sheaf Neural Networks [14.224234978509026]
Sheaf Neural Networks (SNNs) naturally extend Graph Neural Networks (GNNs)
We propose two novel sheaf learning approaches that provide a more intuitive understanding of the involved structure maps.
In our evaluation, we show the limitations of the real-world benchmarks used so far on SNNs.
arXiv Detail & Related papers (2024-07-30T07:17:46Z) - Super Consistency of Neural Network Landscapes and Learning Rate Transfer [72.54450821671624]
We study the landscape through the lens of the loss Hessian.
We find that certain spectral properties under $mu$P are largely independent of the size of the network.
We show that in the Neural Tangent Kernel (NTK) and other scaling regimes, the sharpness exhibits very different dynamics at different scales.
arXiv Detail & Related papers (2024-02-27T12:28:01Z) - Deep Architecture Connectivity Matters for Its Convergence: A
Fine-Grained Analysis [94.64007376939735]
We theoretically characterize the impact of connectivity patterns on the convergence of deep neural networks (DNNs) under gradient descent training.
We show that by a simple filtration on "unpromising" connectivity patterns, we can trim down the number of models to evaluate.
arXiv Detail & Related papers (2022-05-11T17:43:54Z) - Dynamic Inference with Neural Interpreters [72.90231306252007]
We present Neural Interpreters, an architecture that factorizes inference in a self-attention network as a system of modules.
inputs to the model are routed through a sequence of functions in a way that is end-to-end learned.
We show that Neural Interpreters perform on par with the vision transformer using fewer parameters, while being transferrable to a new task in a sample efficient manner.
arXiv Detail & Related papers (2021-10-12T23:22:45Z) - Convolutional Neural Networks Demystified: A Matched Filtering
Perspective Based Tutorial [7.826806223782053]
Convolutional Neural Networks (CNN) are a de-facto standard for the analysis of large volumes of signals and images.
We revisit their operation from first principles and a matched filtering perspective.
It is our hope that this tutorial will help shed new light and physical intuition into the understanding and further development of deep neural networks.
arXiv Detail & Related papers (2021-08-26T09:07:49Z) - Analytic Insights into Structure and Rank of Neural Network Hessian Maps [32.90143789616052]
Hessian of a neural network captures parameter interactions through second-order derivatives of the loss.
We develop theoretical tools to analyze the range of the Hessian map, providing us with a precise understanding of its rank deficiency.
This yields exact formulas and tight upper bounds for the Hessian rank of deep linear networks.
arXiv Detail & Related papers (2021-06-30T17:29:58Z) - Deep Parametric Continuous Convolutional Neural Networks [92.87547731907176]
Parametric Continuous Convolution is a new learnable operator that operates over non-grid structured data.
Our experiments show significant improvement over the state-of-the-art in point cloud segmentation of indoor and outdoor scenes.
arXiv Detail & Related papers (2021-01-17T18:28:23Z) - Dissecting Hessian: Understanding Common Structure of Hessian in Neural
Networks [11.57132149295061]
Hessian captures important properties of the deep neural network loss landscape.
We make new observations about the top eigenspace of layer-wise Hessian.
We show that the new eigenspace structure can be explained by approximating the Hessian using Kronecker factorization.
arXiv Detail & Related papers (2020-10-08T21:18:11Z) - Linguistically Driven Graph Capsule Network for Visual Question
Reasoning [153.76012414126643]
We propose a hierarchical compositional reasoning model called the "Linguistically driven Graph Capsule Network"
The compositional process is guided by the linguistic parse tree. Specifically, we bind each capsule in the lowest layer to bridge the linguistic embedding of a single word in the original question with visual evidence.
Experiments on the CLEVR dataset, CLEVR compositional generation test, and FigureQA dataset demonstrate the effectiveness and composition generalization ability of our end-to-end model.
arXiv Detail & Related papers (2020-03-23T03:34:25Z) - Hold me tight! Influence of discriminative features on deep network
boundaries [63.627760598441796]
We propose a new perspective that relates dataset features to the distance of samples to the decision boundary.
This enables us to carefully tweak the position of the training samples and measure the induced changes on the boundaries of CNNs trained on large-scale vision datasets.
arXiv Detail & Related papers (2020-02-15T09:29:36Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.