Related papers: Spectral Concentration at the Edge of Stability: Information Geometry of Kernel Associative Memory

Spectral Concentration at the Edge of Stability: Information Geometry of Kernel Associative Memory

URL: http://arxiv.org/abs/2511.23083v1
Date: Fri, 28 Nov 2025 11:14:15 GMT
Title: Spectral Concentration at the Edge of Stability: Information Geometry of Kernel Associative Memory
Authors: Akira Tamamori,
Abstract summary: We analyze the network dynamics on a statistical manifold, revealing that the Ridge corresponds to the "Edge of Stability"<n>This unifies learning dynamics and capacity via the Minimum Description Length principle, offering a geometric theory of self-organized criticality.
Score: 0.0
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: High-capacity kernel Hopfield networks exhibit a "Ridge of Optimization" characterized by extreme stability. While previously linked to "Spectral Concentration," its origin remains elusive. Here, we analyze the network dynamics on a statistical manifold, revealing that the Ridge corresponds to the "Edge of Stability," a critical boundary where the Fisher Information Matrix becomes singular. We demonstrate that the apparent Euclidean force antagonism is a manifestation of \textit{Dual Equilibrium} in the Riemannian space. This unifies learning dynamics and capacity via the Minimum Description Length principle, offering a geometric theory of self-organized criticality.

Related papers

The Geometric Mechanics of Contrastive Representation Learning: Alignment Potentials, Entropic Dispersion, and Cross-Modal Divergence [17.501700376593174]
We present a measure-theoretic framework that models learning as the evolution of representation measures on a fixed embedding manifold.<n>By establishing value and consistency in the large-batch limit, we bridge the misalignment objective to explicit energy landscapes.<n>We show that this term induces barrier-driven co-adaptation, enforcing a population-level modality gap as a structural geometric necessity.
arXiv Detail & Related papers (2026-01-27T13:33:03Z)
Random matrix theory of sparse neuronal networks with heterogeneous timescales [0.6181093777643575]
Training recurrent neuronal networks consists of excitatory (E) and inhibitory (I) units with additive noise for working memory computation.<n>Here, we investigate the dynamics near these equilibria and show that they are sparse, non-Hermitian rectangular-block matrices modified by heterogeneous synaptic decay timescales and activation-function gains.<n>An analytic description of the spectral edge is obtained, relating statistical parameters of the Jacobians to near-critical features of the equilibria essential for robust working memory computation.
arXiv Detail & Related papers (2025-12-14T17:02:22Z)
Self-Organization and Spectral Mechanism of Attractor Landscapes in High-Capacity Kernel Hopfield Networks [0.0]
Kernel-based learning can dramatically increase the storage capacity of Hopfield networks.<n>We show that optimal performance is achieved by tuning the system to a spectral "Goldilocks zone" between rank collapse and diffusion.
arXiv Detail & Related papers (2025-11-17T06:58:34Z)
Generalization Below the Edge of Stability: The Role of Data Geometry [60.147710896851045]
We show how data geometry controls generalization in ReLU networks trained below the edge of stability.<n>For data distributions supported on a mixture of low-dimensional balls, we derive generalization bounds that provably adapt to the intrinsic dimension.<n>Our results consolidate disparate empirical findings that have appeared in the literature.
arXiv Detail & Related papers (2025-10-20T21:40:36Z)
Graph-based Clustering Revisited: A Relaxation of Kernel $k$-Means Perspective [73.18641268511318]
We propose a graph-based clustering algorithm that only relaxes the orthonormal constraint to derive clustering results.<n>To ensure a doubly constraint into a gradient, we transform the non-negative constraint into a class probability parameter.
arXiv Detail & Related papers (2025-09-23T09:14:39Z)
Dynamical stability for dense patterns in discrete attractor neural networks [6.159133786557903]
We derive a theory of the local stability of discrete fixed points in a broad class of networks with graded neural activities and in the presence of noise.<n>Our analysis highlights the computational benefits of threshold-linear activation and sparse-like patterns.
arXiv Detail & Related papers (2025-07-14T15:23:24Z)
High-Dimensional Kernel Methods under Covariate Shift: Data-Dependent Implicit Regularization [83.06112052443233]
This paper studies kernel ridge regression in high dimensions under covariate shifts. By a bias-variance decomposition, we theoretically demonstrate that the re-weighting strategy allows for decreasing the variance. For bias, we analyze the regularization of the arbitrary or well-chosen scale, showing that the bias can behave very differently under different regularization scales.
arXiv Detail & Related papers (2024-06-05T12:03:27Z)
High dimensional analysis reveals conservative sharpening and a stochastic edge of stability [21.12433806766051]
We show that the dynamics of the large eigenvalues of the training loss Hessian have some remarkably robust features across models and in the full batch regime.<n>There is often an early period of progressive sharpening where the large eigenvalues increase, followed by stabilization at a predictable value known as the edge of stability.
arXiv Detail & Related papers (2024-04-30T04:54:15Z)
Last-Iterate Convergence of Adaptive Riemannian Gradient Descent for Equilibrium Computation [52.73824786627612]
This paper establishes new convergence results for textitgeodesic strongly monotone games.<n>Our key result shows that RGD attains last-iterate linear convergence in a textitgeometry-agnostic fashion.<n>Overall, this paper presents the first geometry-agnostic last-iterate convergence analysis for games beyond the Euclidean settings.
arXiv Detail & Related papers (2023-06-29T01:20:44Z)
Beyond the Edge of Stability via Two-step Gradient Updates [49.03389279816152]
Gradient Descent (GD) is a powerful workhorse of modern machine learning. GD's ability to find local minimisers is only guaranteed for losses with Lipschitz gradients. This work focuses on simple, yet representative, learning problems via analysis of two-step gradient updates.
arXiv Detail & Related papers (2022-06-08T21:32:50Z)
The Geometry of Robust Value Functions [119.94715309072983]
We introduce a new perspective that enables us to characterize both the non-robust and robust value space. We show that the robust value space is determined by a set conic hypersurfaces, each which contains the robust values of all policies that agree on one state.
arXiv Detail & Related papers (2022-01-30T22:12:17Z)
Convex Analysis of the Mean Field Langevin Dynamics [49.66486092259375]
convergence rate analysis of the mean field Langevin dynamics is presented. $p_q$ associated with the dynamics allows us to develop a convergence theory parallel to classical results in convex optimization.
arXiv Detail & Related papers (2022-01-25T17:13:56Z)

This list is automatically generated from the titles and abstracts of the papers in this site.