Related papers: Local properties of neural networks through the lens of layer-wise Hessians

Local properties of neural networks through the lens of layer-wise Hessians

URL: http://arxiv.org/abs/2510.17486v2
Date: Fri, 07 Nov 2025 19:39:45 GMT
Title: Local properties of neural networks through the lens of layer-wise Hessians
Authors: Maxim Bolshim, Alexander Kugaevskikh,
Abstract summary: We introduce a methodology for analyzing neural networks through the lens of layer-wise Hessian matrices.<n>The concept provides a formal tool for characterizing the local geometry of the parameter space.
Score: 45.88028371034407
License: http://creativecommons.org/licenses/by/4.0/
Abstract: We introduce a methodology for analyzing neural networks through the lens of layer-wise Hessian matrices. The local Hessian of each functional block (layer) is defined as the matrix of second derivatives of a scalar function with respect to the parameters of that layer. This concept provides a formal tool for characterizing the local geometry of the parameter space. We show that the spectral properties of local Hessians, such as the distribution of eigenvalues, reveal quantitative patterns associated with overfitting, underparameterization, and expressivity in neural network architectures. We conduct an extensive empirical study involving 111 experiments across 37 datasets. The results demonstrate consistent structural regularities in the evolution of local Hessians during training and highlight correlations between their spectra and generalization performance. These findings establish a foundation for using local geometric analysis to guide the diagnosis and design of deep neural networks. The proposed framework connects optimization geometry with functional behavior and offers practical insight for improving network architectures and training stability.

Related papers

The Neural Differential Manifold: An Architecture with Explicit Geometric Structure [8.201374511929538]
This paper introduces the Neural Differential Manifold (NDM), a novel neural network architecture that explicitly incorporates geometric structure into its fundamental design.<n>We analyze the theoretical advantages of this approach, including its potential for more efficient optimization, enhanced continual learning, and applications in scientific discovery and controllable generative modeling.
arXiv Detail & Related papers (2025-10-29T02:24:27Z)
Geometric Properties and Graph-Based Optimization of Neural Networks: Addressing Non-Linearity, Dimensionality, and Scalability [0.0]
This research explores neural networks through geometric metrics and graph structures.<n>It addresses the limited understanding of geometric structures governing neural networks.<n>We identify three key challenges: (1) overcoming linear separability limitations, (2) managing the dimensionality-complexity trade-off, and (3) improving scalability through graph representations.
arXiv Detail & Related papers (2025-02-24T03:36:34Z)
Exploring the Manifold of Neural Networks Using Diffusion Geometry [7.038126249994092]
We learn manifold where datapoints are neural networks by introducing a distance between the hidden layer representations of the neural networks.<n>These distances are then fed to the non-linear dimensionality reduction algorithm PHATE to create a manifold of neural networks.<n>Our analysis reveals that high-performing networks cluster together in the manifold, displaying consistent embedding patterns.
arXiv Detail & Related papers (2024-11-19T16:34:45Z)
Reparameterization through Spatial Gradient Scaling [69.27487006953852]
Reparameterization aims to improve the generalization of deep neural networks by transforming convolutional layers into equivalent multi-branched structures during training. We present a novel spatial gradient scaling method to redistribute learning focus among weights in convolutional networks.
arXiv Detail & Related papers (2023-03-05T17:57:33Z)
Data-driven emergence of convolutional structure in neural networks [83.4920717252233]
We show how fully-connected neural networks solving a discrimination task can learn a convolutional structure directly from their inputs. By carefully designing data models, we show that the emergence of this pattern is triggered by the non-Gaussian, higher-order local structure of the inputs.
arXiv Detail & Related papers (2022-02-01T17:11:13Z)
Generalized Shape Metrics on Neural Representations [26.78835065137714]
We provide a family of metric spaces that quantify representational dissimilarity. We modify existing representational similarity measures based on canonical correlation analysis to satisfy the triangle inequality. We identify relationships between neural representations that are interpretable in terms of anatomical features and model performance.
arXiv Detail & Related papers (2021-10-27T19:48:55Z)
Localized Persistent Homologies for more Effective Deep Learning [60.78456721890412]
We introduce an approach that relies on a new filtration function to account for location during network training. We demonstrate experimentally on 2D images of roads and 3D image stacks of neuronal processes that networks trained in this manner are better at recovering the topology of the curvilinear structures they extract.
arXiv Detail & Related papers (2021-10-12T19:28:39Z)
A neural anisotropic view of underspecification in deep learning [60.119023683371736]
We show that the way neural networks handle the underspecification of problems is highly dependent on the data representation. Our results highlight that understanding the architectural inductive bias in deep learning is fundamental to address the fairness, robustness, and generalization of these systems.
arXiv Detail & Related papers (2021-04-29T14:31:09Z)
Provably Efficient Neural Estimation of Structural Equation Model: An Adversarial Approach [144.21892195917758]
We study estimation in a class of generalized Structural equation models (SEMs) We formulate the linear operator equation as a min-max game, where both players are parameterized by neural networks (NNs), and learn the parameters of these neural networks using a gradient descent. For the first time we provide a tractable estimation procedure for SEMs based on NNs with provable convergence and without the need for sample splitting.
arXiv Detail & Related papers (2020-07-02T17:55:47Z)
Equivalence in Deep Neural Networks via Conjugate Matrix Ensembles [0.0]
A numerical approach is developed for detecting the equivalence of deep learning architectures. The empirical evidence supports the it phenomenon that difference between spectral densities of neural architectures and corresponding it conjugate circular ensemble are vanishing.
arXiv Detail & Related papers (2020-06-14T12:34:13Z)

This list is automatically generated from the titles and abstracts of the papers in this site.