Gauge-invariant representation holonomy
- URL: http://arxiv.org/abs/2601.21653v1
- Date: Thu, 29 Jan 2026 12:51:17 GMT
- Title: Gauge-invariant representation holonomy
- Authors: Vasileios Sevetlidis, George Pavlidis,
- Abstract summary: Deep networks learn internal representations whose geometry--how features bend, rotate, and evolve--affects both generalization and robustness.<n>Existing similarity measures such as CKA or SVCCA capture pointwise overlap between activation sets, but miss how representations change along input paths.<n>We introduce representation holonomy, a gauge-invariant statistic that measures this path dependence.
- Score: 1.078600700827543
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Deep networks learn internal representations whose geometry--how features bend, rotate, and evolve--affects both generalization and robustness. Existing similarity measures such as CKA or SVCCA capture pointwise overlap between activation sets, but miss how representations change along input paths. Two models may appear nearly identical under these metrics yet respond very differently to perturbations or adversarial stress. We introduce representation holonomy, a gauge-invariant statistic that measures this path dependence. Conceptually, holonomy quantifies the "twist" accumulated when features are parallel-transported around a small loop in input space: flat representations yield zero holonomy, while nonzero values reveal hidden curvature. Our estimator fixes gauge through global whitening, aligns neighborhoods using shared subspaces and rotation-only Procrustes, and embeds the result back to the full feature space. We prove invariance to orthogonal (and affine, post-whitening) transformations, establish a linear null for affine layers, and show that holonomy vanishes at small radii. Empirically, holonomy increases with loop radius, separates models that appear similar under CKA, and correlates with adversarial and corruption robustness. It also tracks training dynamics as features form and stabilize. Together, these results position representation holonomy as a practical and scalable diagnostic for probing the geometric structure of learned representations beyond pointwise similarity.
Related papers
- Social-JEPA: Emergent Geometric Isomorphism [11.526381612918549]
World models compress rich sensory streams into compact latent codes that anticipate future observations.<n>We let separate agents acquire such models from distinct viewpoints of the same environment without any parameter sharing or coordination.<n>After training, their internal representations exhibit a striking emergent property: the two latent spaces are related by an approximate linear isometry.
arXiv Detail & Related papers (2026-02-28T07:54:43Z) - Fubini Study geometry of representation drift in high dimensional data [0.0]
High dimensional representation drift is commonly quantified using Euclidean or cosine distances.<n>We introduce a projective geometric view of representation drift grounded in the Fubini Study metric.<n>We show that the Fubini Study metric isolates intrinsic evolution by remaining invariant under gauge-induced fluctuations.
arXiv Detail & Related papers (2026-02-01T16:00:59Z) - Local-Curvature-Aware Knowledge Graph Embedding: An Extended Ricci Flow Approach [4.686364613477057]
Knowledge graph embedding relies on the geometry of the embedding space to encode semantic and structural relations.<n>We propose RicciKGE to have the KGE gradient loss coupled with local curvatures in an extended Ricci flow.<n> Experimental improvements on link prediction and node classification benchmarks demonstrate RicciKGE's effectiveness in adapting to heterogeneous knowledge graph structures.
arXiv Detail & Related papers (2025-12-08T09:20:06Z) - The Curved Spacetime of Transformer Architectures [0.3670422696827525]
We present a geometric framework for understanding Transformer-based language models, drawing an explicit analogy to General Relativity.<n>We show that token embeddings should not traverse straight paths in feature space; instead, their layer-wise steps should bend and reorient as interactions mediated by embedding space curvature.
arXiv Detail & Related papers (2025-11-04T22:58:40Z) - Curved Inference: Concern-Sensitive Geometry in Large Language Model Residual Streams [0.0]
We propose a geometric Interpretability framework that tracks how the residual stream trajectory of a large language model bends in response to shifts in semantic concern.<n>We analyse Gemma3-1b and LLaMA3.2-3b using five native-space metrics, with a primary focus on curvature (kappa_i) and salience (S(t))<n>We find that concern-shifted prompts reliably alter internal activation trajectories in both models.
arXiv Detail & Related papers (2025-07-08T23:05:00Z) - Generalized Linear Mode Connectivity for Transformers [87.32299363530996]
A striking phenomenon is linear mode connectivity (LMC), where independently trained models can be connected by low- or zero-loss paths.<n>Prior work has predominantly focused on neuron re-ordering through permutations, but such approaches are limited in scope.<n>We introduce a unified framework that captures four symmetry classes: permutations, semi-permutations, transformations, and general invertible maps.<n>This generalization enables, for the first time, the discovery of low- and zero-barrier linear paths between independently trained Vision Transformers and GPT-2 models.
arXiv Detail & Related papers (2025-06-28T01:46:36Z) - Gradient-Based Feature Learning under Structured Data [57.76552698981579]
In the anisotropic setting, the commonly used spherical gradient dynamics may fail to recover the true direction.
We show that appropriate weight normalization that is reminiscent of batch normalization can alleviate this issue.
In particular, under the spiked model with a suitably large spike, the sample complexity of gradient-based training can be made independent of the information exponent.
arXiv Detail & Related papers (2023-09-07T16:55:50Z) - Shape And Structure Preserving Differential Privacy [70.08490462870144]
We show how the gradient of the squared distance function offers better control over sensitivity than the Laplace mechanism.
We also show how using the gradient of the squared distance function offers better control over sensitivity than the Laplace mechanism.
arXiv Detail & Related papers (2022-09-21T18:14:38Z) - A Model for Multi-View Residual Covariances based on Perspective
Deformation [88.21738020902411]
We derive a model for the covariance of the visual residuals in multi-view SfM, odometry and SLAM setups.
We validate our model with synthetic and real data and integrate it into photometric and feature-based Bundle Adjustment.
arXiv Detail & Related papers (2022-02-01T21:21:56Z) - Geometric phase in a dissipative Jaynes-Cummings model: theoretical
explanation for resonance robustness [68.8204255655161]
We compute the geometric phases acquired in both unitary and dissipative Jaynes-Cummings models.
In the dissipative model, the non-unitary effects arise from the outflow of photons through the cavity walls.
We show the geometric phase is robust, exhibiting a vanishing correction under a non-unitary evolution.
arXiv Detail & Related papers (2021-10-27T15:27:54Z) - Manifold Learning via Manifold Deflation [105.7418091051558]
dimensionality reduction methods provide a valuable means to visualize and interpret high-dimensional data.
Many popular methods can fail dramatically, even on simple two-dimensional Manifolds.
This paper presents an embedding method for a novel, incremental tangent space estimator that incorporates global structure as coordinates.
Empirically, we show our algorithm recovers novel and interesting embeddings on real-world and synthetic datasets.
arXiv Detail & Related papers (2020-07-07T10:04:28Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.