The Hessian perspective into the Nature of Convolutional Neural Networks
- URL: http://arxiv.org/abs/2305.09088v1
- Date: Tue, 16 May 2023 01:15:00 GMT
- Title: The Hessian perspective into the Nature of Convolutional Neural Networks
- Authors: Sidak Pal Singh, Thomas Hofmann, Bernhard Sch\"olkopf
- Abstract summary: We develop a framework relying on Toeplitz representation of CNNs, and then utilize it to reveal the Hessian structure and, in particular, its rank.
Overall, our work generalizes and establishes the key insight that, even in CNNs, the Hessian rank grows as the square root of the number of parameters.
- Score: 32.7270996241955
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: While Convolutional Neural Networks (CNNs) have long been investigated and
applied, as well as theorized, we aim to provide a slightly different
perspective into their nature -- through the perspective of their Hessian maps.
The reason is that the loss Hessian captures the pairwise interaction of
parameters and therefore forms a natural ground to probe how the architectural
aspects of CNN get manifested in its structure and properties. We develop a
framework relying on Toeplitz representation of CNNs, and then utilize it to
reveal the Hessian structure and, in particular, its rank. We prove tight upper
bounds (with linear activations), which closely follow the empirical trend of
the Hessian rank and hold in practice in more general settings. Overall, our
work generalizes and establishes the key insight that, even in CNNs, the
Hessian rank grows as the square root of the number of parameters.
Related papers
- Gradient Descent in Neural Networks as Sequential Learning in RKBS [63.011641517977644]
We construct an exact power-series representation of the neural network in a finite neighborhood of the initial weights.
We prove that, regardless of width, the training sequence produced by gradient descent can be exactly replicated by regularized sequential learning.
arXiv Detail & Related papers (2023-02-01T03:18:07Z) - Deep Architecture Connectivity Matters for Its Convergence: A
Fine-Grained Analysis [94.64007376939735]
We theoretically characterize the impact of connectivity patterns on the convergence of deep neural networks (DNNs) under gradient descent training.
We show that by a simple filtration on "unpromising" connectivity patterns, we can trim down the number of models to evaluate.
arXiv Detail & Related papers (2022-05-11T17:43:54Z) - Dynamic Inference with Neural Interpreters [72.90231306252007]
We present Neural Interpreters, an architecture that factorizes inference in a self-attention network as a system of modules.
inputs to the model are routed through a sequence of functions in a way that is end-to-end learned.
We show that Neural Interpreters perform on par with the vision transformer using fewer parameters, while being transferrable to a new task in a sample efficient manner.
arXiv Detail & Related papers (2021-10-12T23:22:45Z) - On the Impact of Stable Ranks in Deep Nets [3.307203784120635]
We show that stable ranks appear layerwise essentially as linear factors whose effect accumulates exponentially depthwise.
Our results imply that stable ranks appear layerwise essentially as linear factors whose effect accumulates exponentially depthwise.
arXiv Detail & Related papers (2021-10-05T20:04:41Z) - Convolutional Neural Networks Demystified: A Matched Filtering
Perspective Based Tutorial [7.826806223782053]
Convolutional Neural Networks (CNN) are a de-facto standard for the analysis of large volumes of signals and images.
We revisit their operation from first principles and a matched filtering perspective.
It is our hope that this tutorial will help shed new light and physical intuition into the understanding and further development of deep neural networks.
arXiv Detail & Related papers (2021-08-26T09:07:49Z) - Analytic Insights into Structure and Rank of Neural Network Hessian Maps [32.90143789616052]
Hessian of a neural network captures parameter interactions through second-order derivatives of the loss.
We develop theoretical tools to analyze the range of the Hessian map, providing us with a precise understanding of its rank deficiency.
This yields exact formulas and tight upper bounds for the Hessian rank of deep linear networks.
arXiv Detail & Related papers (2021-06-30T17:29:58Z) - Deep Parametric Continuous Convolutional Neural Networks [92.87547731907176]
Parametric Continuous Convolution is a new learnable operator that operates over non-grid structured data.
Our experiments show significant improvement over the state-of-the-art in point cloud segmentation of indoor and outdoor scenes.
arXiv Detail & Related papers (2021-01-17T18:28:23Z) - Dissecting Hessian: Understanding Common Structure of Hessian in Neural
Networks [11.57132149295061]
Hessian captures important properties of the deep neural network loss landscape.
We make new observations about the top eigenspace of layer-wise Hessian.
We show that the new eigenspace structure can be explained by approximating the Hessian using Kronecker factorization.
arXiv Detail & Related papers (2020-10-08T21:18:11Z) - Teaching CNNs to mimic Human Visual Cognitive Process & regularise
Texture-Shape bias [18.003188982585737]
Recent experiments in computer vision demonstrate texture bias as the primary reason for supreme results in models employing Convolutional Neural Networks (CNNs)
It is believed that the cost function forces the CNN to take a greedy approach and develop a proclivity for local information like texture to increase accuracy, thus failing to explore any global statistics.
We propose CognitiveCNN, a new intuitive architecture, inspired from feature integration theory in psychology to utilise human interpretable feature like shape, texture, edges etc. to reconstruct, and classify the image.
arXiv Detail & Related papers (2020-06-25T22:32:54Z) - Linguistically Driven Graph Capsule Network for Visual Question
Reasoning [153.76012414126643]
We propose a hierarchical compositional reasoning model called the "Linguistically driven Graph Capsule Network"
The compositional process is guided by the linguistic parse tree. Specifically, we bind each capsule in the lowest layer to bridge the linguistic embedding of a single word in the original question with visual evidence.
Experiments on the CLEVR dataset, CLEVR compositional generation test, and FigureQA dataset demonstrate the effectiveness and composition generalization ability of our end-to-end model.
arXiv Detail & Related papers (2020-03-23T03:34:25Z) - Hold me tight! Influence of discriminative features on deep network
boundaries [63.627760598441796]
We propose a new perspective that relates dataset features to the distance of samples to the decision boundary.
This enables us to carefully tweak the position of the training samples and measure the induced changes on the boundaries of CNNs trained on large-scale vision datasets.
arXiv Detail & Related papers (2020-02-15T09:29:36Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.