The Curvature Rate λ: A Scalar Measure of Input-Space Sharpness in Neural Networks
- URL: http://arxiv.org/abs/2511.01438v1
- Date: Mon, 03 Nov 2025 10:46:03 GMT
- Title: The Curvature Rate λ: A Scalar Measure of Input-Space Sharpness in Neural Networks
- Authors: Jacob Poschl,
- Abstract summary: Curvature influences generalization, robustness, and how reliably neural networks respond to small input perturbations.<n>We introduce a scalar curvature measure defined directly in input space: the curvature rate lambda.<n>lambda tracks the emergence of high-frequency structure in the decision boundary.
- Score: 0.0
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Curvature influences generalization, robustness, and how reliably neural networks respond to small input perturbations. Existing sharpness metrics are typically defined in parameter space (e.g., Hessian eigenvalues) and can be expensive, sensitive to reparameterization, and difficult to interpret in functional terms. We introduce a scalar curvature measure defined directly in input space: the curvature rate {\lambda}, given by the exponential growth rate of higher-order input derivatives. Empirically, {\lambda} is estimated as the slope of log ||D^n f|| versus n for small n. This growth-rate perspective unifies classical analytic quantities: for analytic functions, {\lambda} corresponds to the inverse radius of convergence, and for bandlimited signals, it reflects the spectral cutoff. The same principle extends to neural networks, where {\lambda} tracks the emergence of high-frequency structure in the decision boundary. Experiments on analytic functions and neural networks (Two Moons and MNIST) show that {\lambda} evolves predictably during training and can be directly shaped using a simple derivative-based regularizer, Curvature Rate Regularization (CRR). Compared to Sharpness-Aware Minimization (SAM), CRR achieves similar accuracy while yielding flatter input-space geometry and improved confidence calibration. By grounding curvature in differentiation dynamics, {\lambda} provides a compact, interpretable, and parameterization-invariant descriptor of functional smoothness in learned models.
Related papers
- Manifold limit for the training of shallow graph convolutional neural networks [1.2744523252873352]
We study the consistency of the training of shallow graph convolutional neural networks (GCNNs) on proximity graphs of sampled point clouds.<n>We prove $$-convergence of regularized empirical risk minimization functionals and corresponding convergence of their global minimizers.
arXiv Detail & Related papers (2026-01-09T18:59:20Z) - Learning with Norm Constrained, Over-parameterized, Two-layer Neural Networks [54.177130905659155]
Recent studies show that a reproducing kernel Hilbert space (RKHS) is not a suitable space to model functions by neural networks.
In this paper, we study a suitable function space for over- parameterized two-layer neural networks with bounded norms.
arXiv Detail & Related papers (2024-04-29T15:04:07Z) - A Mean-Field Analysis of Neural Stochastic Gradient Descent-Ascent for Functional Minimax Optimization [90.87444114491116]
This paper studies minimax optimization problems defined over infinite-dimensional function classes of overparametricized two-layer neural networks.
We address (i) the convergence of the gradient descent-ascent algorithm and (ii) the representation learning of the neural networks.
Results show that the feature representation induced by the neural networks is allowed to deviate from the initial one by the magnitude of $O(alpha-1)$, measured in terms of the Wasserstein distance.
arXiv Detail & Related papers (2024-04-18T16:46:08Z) - Approximation Results for Gradient Descent trained Neural Networks [0.0]
The networks are fully connected constant depth increasing width.
The continuous kernel error norm implies an approximation under the natural smoothness assumption required for smooth functions.
arXiv Detail & Related papers (2023-09-09T18:47:55Z) - Stochastic Marginal Likelihood Gradients using Neural Tangent Kernels [78.6096486885658]
We introduce lower bounds to the linearized Laplace approximation of the marginal likelihood.
These bounds are amenable togradient-based optimization and allow to trade off estimation accuracy against computational complexity.
arXiv Detail & Related papers (2023-06-06T19:02:57Z) - Kernel-based off-policy estimation without overlap: Instance optimality
beyond semiparametric efficiency [53.90687548731265]
We study optimal procedures for estimating a linear functional based on observational data.
For any convex and symmetric function class $mathcalF$, we derive a non-asymptotic local minimax bound on the mean-squared error.
arXiv Detail & Related papers (2023-01-16T02:57:37Z) - Implicit Bias of MSE Gradient Optimization in Underparameterized Neural
Networks [0.0]
We study the dynamics of a neural network in function space when optimizing the mean squared error via gradient flow.
We show that the network learns eigenfunctions of an integral operator $T_Kinfty$ determined by the Neural Tangent Kernel (NTK)
We conclude that damped deviations offers a simple and unifying perspective of the dynamics when optimizing the squared error.
arXiv Detail & Related papers (2022-01-12T23:28:41Z) - ViViT: Curvature access through the generalized Gauss-Newton's low-rank
structure [26.24282086797512]
Curvature in form of the Hessian or its generalized Gauss-Newton (GGN) approximation is valuable for algorithms that rely on a local model for the loss to train, compress, or explain deep networks.
We present ViViT, a curvature model that leverages the GGN's low-rank structure without further approximations.
arXiv Detail & Related papers (2021-06-04T17:37:47Z) - Understanding Implicit Regularization in Over-Parameterized Single Index
Model [55.41685740015095]
We design regularization-free algorithms for the high-dimensional single index model.
We provide theoretical guarantees for the induced implicit regularization phenomenon.
arXiv Detail & Related papers (2020-07-16T13:27:47Z) - Tight Nonparametric Convergence Rates for Stochastic Gradient Descent
under the Noiseless Linear Model [0.0]
We analyze the convergence of single-pass, fixed step-size gradient descent on the least-square risk under this model.
As a special case, we analyze an online algorithm for estimating a real function on the unit interval from the noiseless observation of its value at randomly sampled points.
arXiv Detail & Related papers (2020-06-15T08:25:50Z) - SLEIPNIR: Deterministic and Provably Accurate Feature Expansion for
Gaussian Process Regression with Derivatives [86.01677297601624]
We propose a novel approach for scaling GP regression with derivatives based on quadrature Fourier features.
We prove deterministic, non-asymptotic and exponentially fast decaying error bounds which apply for both the approximated kernel as well as the approximated posterior.
arXiv Detail & Related papers (2020-03-05T14:33:20Z) - Solving high-dimensional eigenvalue problems using deep neural networks:
A diffusion Monte Carlo like approach [14.558626910178127]
The eigenvalue problem is reformulated as a fixed point problem of the semigroup flow induced by the operator.
The method shares a similar spirit with diffusion Monte Carlo but augments a direct approximation to the eigenfunction through neural-network ansatz.
Our approach is able to provide accurate eigenvalue and eigenfunction approximations in several numerical examples.
arXiv Detail & Related papers (2020-02-07T03:08:31Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.