Spectral Archaeology: The Causal Topology of Model Evolution
- URL: http://arxiv.org/abs/2601.03424v1
- Date: Tue, 06 Jan 2026 21:26:54 GMT
- Title: Spectral Archaeology: The Causal Topology of Model Evolution
- Authors: Valentin Noël,
- Abstract summary: Behavioral benchmarks tell us textitwhat a model does, but not textithow.<n>We introduce a training-free mechanistic probe using attention-graph spectra.<n>Across 12 models and 10 languages, these measures yield stable fingerprints'' that expose discontinuities missed by standard evaluation.
- Score: 0.0
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Behavioral benchmarks tell us \textit{what} a model does, but not \textit{how}. We introduce a training-free mechanistic probe using attention-graph spectra. Treating each layer as a token graph, we compute algebraic connectivity ($λ_2$), smoothness, and spectral entropy. Across 12 models and 10 languages, these measures yield stable ``spectral fingerprints'' that expose discontinuities missed by standard evaluation. We report four results. (1) Models undergoing specific curriculum transitions (e.g., code-to-chat) show an English-only, syntax-triggered connectivity failure on non-canonical constructions, reaching $Δλ_2 \approx -0.76$. We term this scar \textit{Passive-Triggered Connectivity Collapse} (PTCC). Analysis of the Phi lineage reveals that PTCC appears and resolves across developmental stages, implicating brittle curriculum shifts rather than synthetic data per se. (2) PTCC reflects a specialization trade-off: strengthened formal routing at the expense of stylistic flexibility. (3) We identify four recurrent processing strategies; simple frozen-threshold rules enable perfect forensic identification across lineages. (4) Mechanistically, PTCC localizes to a sparse Layer 2 ``compensatory patch'' of heads that fails under syntactic stress; activation steering can partially restore connectivity, recovering $\approx 38\%$ of lost information flow. Finally, dominant topological regimes track tokenization density more than language identity, suggesting ``healthy'' geometry varies systematically across scripts. Overall, attention-graph spectra provide a practical tool for auditing and training-regime verification.
Related papers
- Aligning the Unseen in Attributed Graphs: Interplay between Graph Geometry and Node Attributes Manifold [0.46976113832881716]
We introduce a custom variational autoencoder that separates manifold learning from structural alignment.<n>By quantifying the metric distortion needed to map the attribute manifold onto the graph's Heat Kernel, we transform geometric conflict into a structural descriptor.
arXiv Detail & Related papers (2026-01-30T10:34:26Z) - SIGMA: Scalable Spectral Insights for LLM Collapse [51.863164847253366]
We introduce SIGMA (Spectral Inequalities for Gram Matrix Analysis), a unified framework for model collapse.<n>By utilizing benchmarks that deriving and deterministic bounds on the matrix's spectrum, SIGMA provides a mathematically grounded metric to track the contraction of the representation space.<n>We demonstrate that SIGMA effectively captures the transition towards states, offering both theoretical insights into the mechanics of collapse.
arXiv Detail & Related papers (2026-01-06T19:47:11Z) - Geometry of Reason: Spectral Signatures of Valid Mathematical Reasoning [0.0]
We present a training-free method for detecting valid mathematical reasoning in large language models through spectral analysis of attention patterns.<n>The method requires no training data, fine-tuning, or learned classifiers: a single threshold on a spectral metric suffices for high accuracy.<n>These findings establish spectral graph analysis as a principled framework for reasoning verification with immediate applications to hallucination detection and AI safety monitoring.
arXiv Detail & Related papers (2026-01-02T18:49:37Z) - Protein Structure Tokenization via Geometric Byte Pair Encoding [36.39587248348813]
We introduce GeoBPE, a principled protein structure tokenizers (PSTs)<n>GeoBPE transforms continuous, noisy, multi-scale backbone conformations into discrete sentences'' of geometry while enforcing global constraints.<n>It offers compression ($>$10x reduction in bits-per-residue at similar distortion rate), data efficiency ($>$10x less training data), and generalization.
arXiv Detail & Related papers (2025-11-13T22:53:29Z) - Transformers Provably Learn Directed Acyclic Graphs via Kernel-Guided Mutual Information [91.66597637613263]
transformer-based models leveraging the attention mechanism have demonstrated strong empirical success in capturing complex dependencies within graphs.<n>We introduce a novel information-theoretic metric: the kernel-guided mutual information (KG-MI) based on the $f$-divergence.<n>We prove that, given sequences generated by a $K$-parent DAG, training a single-layer, multi-head transformer via a gradient ascent converges to the global optimum time.
arXiv Detail & Related papers (2025-10-29T14:07:12Z) - Training-Free Spectral Fingerprints of Voice Processing in Transformers [0.0]
We show that different transformer architectures implement identical linguistic computations via distinct connectivity patterns.<n>Using graph signal processing on attention induced token graphs, we track changes in connectivity across 20 languages and three model families.
arXiv Detail & Related papers (2025-10-21T23:33:43Z) - Hierarchical Self-Supervised Representation Learning for Depression Detection from Speech [51.14752758616364]
Speech-based depression detection (SDD) is a promising, non-invasive alternative to traditional clinical assessments.<n>We propose HAREN-CTC, a novel architecture that integrates multi-layer SSL features using cross-attention within a multitask learning framework.<n>The model achieves state-of-the-art macro F1-scores of 0.81 on DAIC-WOZ and 0.82 on MODMA, outperforming prior methods across both evaluation scenarios.
arXiv Detail & Related papers (2025-10-05T09:32:12Z) - Graph Alignment via Dual-Pass Spectral Encoding and Latent Space Communication [31.43539830271355]
We propose a novel graph alignment framework that simultaneously enhances node distinctiveness and enforces geometric consistency across latent spaces.<n>Our approach introduces a dual-pass encoder that combines low-pass and high-pass spectral filters to generate embeddings that are both structure-aware and highly discriminative.
arXiv Detail & Related papers (2025-09-11T16:36:16Z) - Structural Alignment Improves Graph Test-Time Adaptation [17.564393890432193]
We introduce Test-Time Structural Alignment (TSA), a novel algorithm for Graph Test-Time Adaptation (GTTA)<n>TSA aligns graph structures during inference without accessing the source data.<n>Experiments on synthetic and real-world datasets demonstrate TSA's consistent outperformance of both non-graph TTA methods and state-of-the-art GTTA baselines.
arXiv Detail & Related papers (2025-02-25T16:26:25Z) - SINDER: Repairing the Singular Defects of DINOv2 [61.98878352956125]
Vision Transformer models trained on large-scale datasets often exhibit artifacts in the patch token they extract.
We propose a novel fine-tuning smooth regularization that rectifies structural deficiencies using only a small dataset.
arXiv Detail & Related papers (2024-07-23T20:34:23Z) - What Improves the Generalization of Graph Transformers? A Theoretical Dive into the Self-attention and Positional Encoding [67.59552859593985]
Graph Transformers, which incorporate self-attention and positional encoding, have emerged as a powerful architecture for various graph learning tasks.
This paper introduces first theoretical investigation of a shallow Graph Transformer for semi-supervised classification.
arXiv Detail & Related papers (2024-06-04T05:30:16Z) - Counterfactual Intervention Feature Transfer for Visible-Infrared Person
Re-identification [69.45543438974963]
We find graph-based methods in the visible-infrared person re-identification task (VI-ReID) suffer from bad generalization because of two issues.
The well-trained input features weaken the learning of graph topology, making it not generalized enough during the inference process.
We propose a Counterfactual Intervention Feature Transfer (CIFT) method to tackle these problems.
arXiv Detail & Related papers (2022-08-01T16:15:31Z) - ExpressivE: A Spatio-Functional Embedding For Knowledge Graph Completion [78.8942067357231]
ExpressivE embeds pairs of entities as points and relations as hyper-parallelograms in the virtual triple space.
We show that ExpressivE is competitive with state-of-the-art KGEs and even significantly outperforms them on W18RR.
arXiv Detail & Related papers (2022-06-08T23:34:39Z) - Mitigating Generation Shifts for Generalized Zero-Shot Learning [52.98182124310114]
Generalized Zero-Shot Learning (GZSL) is the task of leveraging semantic information (e.g., attributes) to recognize the seen and unseen samples, where unseen classes are not observable during training.
We propose a novel Generation Shifts Mitigating Flow framework for learning unseen data synthesis efficiently and effectively.
Experimental results demonstrate that GSMFlow achieves state-of-the-art recognition performance in both conventional and generalized zero-shot settings.
arXiv Detail & Related papers (2021-07-07T11:43:59Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.