Related papers: Spectral Archaeology: The Causal Topology of Model Evolution

Spectral Archaeology: The Causal Topology of Model Evolution

URL: http://arxiv.org/abs/2601.03424v1
Date: Tue, 06 Jan 2026 21:26:54 GMT
Title: Spectral Archaeology: The Causal Topology of Model Evolution
Authors: Valentin Noël,
Abstract summary: Behavioral benchmarks tell us textitwhat a model does, but not textithow.<n>We introduce a training-free mechanistic probe using attention-graph spectra.<n>Across 12 models and 10 languages, these measures yield stable fingerprints'' that expose discontinuities missed by standard evaluation.
Score: 0.0
License: http://creativecommons.org/licenses/by/4.0/
Abstract: Behavioral benchmarks tell us \textit{what} a model does, but not \textit{how}. We introduce a training-free mechanistic probe using attention-graph spectra. Treating each layer as a token graph, we compute algebraic connectivity ($λ_2$), smoothness, and spectral entropy. Across 12 models and 10 languages, these measures yield stable ``spectral fingerprints'' that expose discontinuities missed by standard evaluation. We report four results. (1) Models undergoing specific curriculum transitions (e.g., code-to-chat) show an English-only, syntax-triggered connectivity failure on non-canonical constructions, reaching $Δλ_2 \approx -0.76$. We term this scar \textit{Passive-Triggered Connectivity Collapse} (PTCC). Analysis of the Phi lineage reveals that PTCC appears and resolves across developmental stages, implicating brittle curriculum shifts rather than synthetic data per se. (2) PTCC reflects a specialization trade-off: strengthened formal routing at the expense of stylistic flexibility. (3) We identify four recurrent processing strategies; simple frozen-threshold rules enable perfect forensic identification across lineages. (4) Mechanistically, PTCC localizes to a sparse Layer 2 ``compensatory patch'' of heads that fails under syntactic stress; activation steering can partially restore connectivity, recovering $\approx 38\%$ of lost information flow. Finally, dominant topological regimes track tokenization density more than language identity, suggesting ``healthy'' geometry varies systematically across scripts. Overall, attention-graph spectra provide a practical tool for auditing and training-regime verification.

Related papers

Aligning the Unseen in Attributed Graphs: Interplay between Graph Geometry and Node Attributes Manifold [0.46976113832881716]
We introduce a custom variational autoencoder that separates manifold learning from structural alignment.<n>By quantifying the metric distortion needed to map the attribute manifold onto the graph's Heat Kernel, we transform geometric conflict into a structural descriptor.
arXiv Detail & Related papers (2026-01-30T10:34:26Z)
SIGMA: Scalable Spectral Insights for LLM Collapse [51.863164847253366]
We introduce SIGMA (Spectral Inequalities for Gram Matrix Analysis), a unified framework for model collapse.<n>By utilizing benchmarks that deriving and deterministic bounds on the matrix's spectrum, SIGMA provides a mathematically grounded metric to track the contraction of the representation space.<n>We demonstrate that SIGMA effectively captures the transition towards states, offering both theoretical insights into the mechanics of collapse.
arXiv Detail & Related papers (2026-01-06T19:47:11Z)
Geometry of Reason: Spectral Signatures of Valid Mathematical Reasoning [0.0]
We present a training-free method for detecting valid mathematical reasoning in large language models through spectral analysis of attention patterns.<n>The method requires no training data, fine-tuning, or learned classifiers: a single threshold on a spectral metric suffices for high accuracy.<n>These findings establish spectral graph analysis as a principled framework for reasoning verification with immediate applications to hallucination detection and AI safety monitoring.
arXiv Detail & Related papers (2026-01-02T18:49:37Z)
Protein Structure Tokenization via Geometric Byte Pair Encoding [36.39587248348813]
We introduce GeoBPE, a principled protein structure tokenizers (PSTs)<n>GeoBPE transforms continuous, noisy, multi-scale backbone conformations into discrete sentences'' of geometry while enforcing global constraints.<n>It offers compression ($>$10x reduction in bits-per-residue at similar distortion rate), data efficiency ($>$10x less training data), and generalization.
arXiv Detail & Related papers (2025-11-13T22:53:29Z)
Transformers Provably Learn Directed Acyclic Graphs via Kernel-Guided Mutual Information [91.66597637613263]
transformer-based models leveraging the attention mechanism have demonstrated strong empirical success in capturing complex dependencies within graphs.<n>We introduce a novel information-theoretic metric: the kernel-guided mutual information (KG-MI) based on the $f$-divergence.<n>We prove that, given sequences generated by a $K$-parent DAG, training a single-layer, multi-head transformer via a gradient ascent converges to the global optimum time.
arXiv Detail & Related papers (2025-10-29T14:07:12Z)
Training-Free Spectral Fingerprints of Voice Processing in Transformers [0.0]
We show that different transformer architectures implement identical linguistic computations via distinct connectivity patterns.<n>Using graph signal processing on attention induced token graphs, we track changes in connectivity across 20 languages and three model families.
arXiv Detail & Related papers (2025-10-21T23:33:43Z)
Hierarchical Self-Supervised Representation Learning for Depression Detection from Speech [51.14752758616364]
Speech-based depression detection (SDD) is a promising, non-invasive alternative to traditional clinical assessments.<n>We propose HAREN-CTC, a novel architecture that integrates multi-layer SSL features using cross-attention within a multitask learning framework.<n>The model achieves state-of-the-art macro F1-scores of 0.81 on DAIC-WOZ and 0.82 on MODMA, outperforming prior methods across both evaluation scenarios.
arXiv Detail & Related papers (2025-10-05T09:32:12Z)
Graph Alignment via Dual-Pass Spectral Encoding and Latent Space Communication [31.43539830271355]
We propose a novel graph alignment framework that simultaneously enhances node distinctiveness and enforces geometric consistency across latent spaces.<n>Our approach introduces a dual-pass encoder that combines low-pass and high-pass spectral filters to generate embeddings that are both structure-aware and highly discriminative.
arXiv Detail & Related papers (2025-09-11T16:36:16Z)
Structural Alignment Improves Graph Test-Time Adaptation [17.564393890432193]
We introduce Test-Time Structural Alignment (TSA), a novel algorithm for Graph Test-Time Adaptation (GTTA)<n>TSA aligns graph structures during inference without accessing the source data.<n>Experiments on synthetic and real-world datasets demonstrate TSA's consistent outperformance of both non-graph TTA methods and state-of-the-art GTTA baselines.
arXiv Detail & Related papers (2025-02-25T16:26:25Z)
SINDER: Repairing the Singular Defects of DINOv2 [61.98878352956125]
Vision Transformer models trained on large-scale datasets often exhibit artifacts in the patch token they extract. We propose a novel fine-tuning smooth regularization that rectifies structural deficiencies using only a small dataset.
arXiv Detail & Related papers (2024-07-23T20:34:23Z)
What Improves the Generalization of Graph Transformers? A Theoretical Dive into the Self-attention and Positional Encoding [67.59552859593985]
Graph Transformers, which incorporate self-attention and positional encoding, have emerged as a powerful architecture for various graph learning tasks. This paper introduces first theoretical investigation of a shallow Graph Transformer for semi-supervised classification.
arXiv Detail & Related papers (2024-06-04T05:30:16Z)
Counterfactual Intervention Feature Transfer for Visible-Infrared Person Re-identification [69.45543438974963]
We find graph-based methods in the visible-infrared person re-identification task (VI-ReID) suffer from bad generalization because of two issues. The well-trained input features weaken the learning of graph topology, making it not generalized enough during the inference process. We propose a Counterfactual Intervention Feature Transfer (CIFT) method to tackle these problems.
arXiv Detail & Related papers (2022-08-01T16:15:31Z)
ExpressivE: A Spatio-Functional Embedding For Knowledge Graph Completion [78.8942067357231]
ExpressivE embeds pairs of entities as points and relations as hyper-parallelograms in the virtual triple space. We show that ExpressivE is competitive with state-of-the-art KGEs and even significantly outperforms them on W18RR.
arXiv Detail & Related papers (2022-06-08T23:34:39Z)
Mitigating Generation Shifts for Generalized Zero-Shot Learning [52.98182124310114]
Generalized Zero-Shot Learning (GZSL) is the task of leveraging semantic information (e.g., attributes) to recognize the seen and unseen samples, where unseen classes are not observable during training. We propose a novel Generation Shifts Mitigating Flow framework for learning unseen data synthesis efficiently and effectively. Experimental results demonstrate that GSMFlow achieves state-of-the-art recognition performance in both conventional and generalized zero-shot settings.
arXiv Detail & Related papers (2021-07-07T11:43:59Z)

This list is automatically generated from the titles and abstracts of the papers in this site.