Related papers: Training-Free Spectral Fingerprints of Voice Processing in Transformers

Training-Free Spectral Fingerprints of Voice Processing in Transformers

URL: http://arxiv.org/abs/2510.19131v1
Date: Tue, 21 Oct 2025 23:33:43 GMT
Title: Training-Free Spectral Fingerprints of Voice Processing in Transformers
Authors: Valentin Noël,
Abstract summary: We show that different transformer architectures implement identical linguistic computations via distinct connectivity patterns.<n>Using graph signal processing on attention induced token graphs, we track changes in connectivity across 20 languages and three model families.
Score: 0.0
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Different transformer architectures implement identical linguistic computations via distinct connectivity patterns, yielding model imprinted ``computational fingerprints'' detectable through spectral analysis. Using graph signal processing on attention induced token graphs, we track changes in algebraic connectivity (Fiedler value, $\Delta\lambda_2$) under voice alternation across 20 languages and three model families, with a prespecified early window (layers 2--5). Our analysis uncovers clear architectural signatures: Phi-3-Mini shows a dramatic English specific early layer disruption ($\overline{\Delta\lambda_2}_{[2,5]}\!\approx\!-0.446$) while effects in 19 other languages are minimal, consistent with public documentation that positions the model primarily for English use. Qwen2.5-7B displays small, distributed shifts that are largest for morphologically rich languages, and LLaMA-3.2-1B exhibits systematic but muted responses. These spectral signatures correlate strongly with behavioral differences (Phi-3: $r=-0.976$) and are modulated by targeted attention head ablations, linking the effect to early attention structure and confirming functional relevance. Taken together, the findings are consistent with the view that training emphasis can leave detectable computational imprints: specialized processing strategies that manifest as measurable connectivity patterns during syntactic transformations. Beyond voice alternation, the framework differentiates reasoning modes, indicating utility as a simple, training free diagnostic for revealing architectural biases and supporting model reliability analysis.

Related papers

Spectral Archaeology: The Causal Topology of Model Evolution [0.0]
Behavioral benchmarks tell us textitwhat a model does, but not textithow.<n>We introduce a training-free mechanistic probe using attention-graph spectra.<n>Across 12 models and 10 languages, these measures yield stable fingerprints'' that expose discontinuities missed by standard evaluation.
arXiv Detail & Related papers (2026-01-06T21:26:54Z)
OTSNet: A Neurocognitive-Inspired Observation-Thinking-Spelling Pipeline for Scene Text Recognition [3.5518986305758027]
Scene Text Recognition (STR) remains challenging due to real-world complexities.<n>We propose OTSNet, a novel three-stage network embodying a neurocognitive-inspired Observation-Thinkingpelling pipeline for unified STR modeling.<n>OTSNet achieves 83.5% average accuracy on the Union14M-L benchmark and 79.1% on the heavily occluded OST-establishing new records across 9 out of 14 evaluation scenarios.
arXiv Detail & Related papers (2025-11-11T11:40:48Z)
A Graph Signal Processing Framework for Hallucination Detection in Large Language Models [0.0]
We show that factual statements exhibit consistent "energy mountain" behavior with low-frequency convergence, while different hallucination types show distinct signatures.<n>A simple detector using spectral signatures achieves 88.75% accuracy versus 75% for perplexity-based baselines.<n>These findings indicate that spectral geometry may capture reasoning patterns and error behaviors, potentially offering a framework for detection in large language models.
arXiv Detail & Related papers (2025-10-21T22:35:48Z)
Mitigating Multimodal Hallucinations via Gradient-based Self-Reflection [49.26064449816502]
We propose a Gradient-based Influence-Aware Constrained Decoding (GACD) method to address text-visual bias and co-occurrence bias.<n>GACD effectively reduces hallucinations and improves the visual grounding of MLLM outputs.
arXiv Detail & Related papers (2025-09-03T08:13:52Z)
Interference Matrix: Quantifying Cross-Lingual Interference in Transformer Encoders [55.749883010057545]
We construct an interference matrix by training and evaluating small BERT-like models on all possible language pairs.<n>Our analysis reveals that interference between languages is asymmetrical and that its patterns do not align with traditional linguistic characteristics.
arXiv Detail & Related papers (2025-08-04T10:02:19Z)
Tokenizing Single-Channel EEG with Time-Frequency Motif Learning [16.732494632599934]
This paper presents TFM-Tokenizer, a novel tokenization framework.<n>It learns a vocabulary of time-frequency motifs from single-channel EEG signals and encodes them into discrete tokens.<n> Experiments on ear-EEG sleep staging, which differs from the pretraining data in signal format, channel configuration, recording device, and task, show that our tokenizer outperforms baselines by 14%.
arXiv Detail & Related papers (2025-02-22T03:32:36Z)
Shapley Head Pruning: Identifying and Removing Interference in Multilingual Transformers [54.4919139401528]
We show that it is possible to reduce interference by identifying and pruning language-specific parameters. We show that removing identified attention heads from a fixed model improves performance for a target language on both sentence classification and structural prediction.
arXiv Detail & Related papers (2022-10-11T18:11:37Z)
Direct speech-to-speech translation with discrete units [64.19830539866072]
We present a direct speech-to-speech translation (S2ST) model that translates speech from one language to speech in another language without relying on intermediate text generation. We propose to predict the self-supervised discrete representations learned from an unlabeled speech corpus instead. When target text transcripts are available, we design a multitask learning framework with joint speech and text training that enables the model to generate dual mode output (speech and text) simultaneously in the same inference pass.
arXiv Detail & Related papers (2021-07-12T17:40:43Z)
End-to-end Audio-visual Speech Recognition with Conformers [65.30276363777514]
We present a hybrid CTC/Attention model based on a ResNet-18 and Convolution-augmented transformer (Conformer) In particular, the audio and visual encoders learn to extract features directly from raw pixels and audio waveforms. We show that our proposed models raise the state-of-the-art performance by a large margin in audio-only, visual-only, and audio-visual experiments.
arXiv Detail & Related papers (2021-02-12T18:00:08Z)
Graph Attention Networks for Speaker Verification [43.01058120303278]
This work presents a novel back-end framework for speaker verification using graph attention networks. We first construct a graph using segment-wise speaker embeddings and then input these to graph attention networks. After a few graph attention layers with residual connections, each node is projected into a one-dimensional space using affine transform.
arXiv Detail & Related papers (2020-10-22T09:08:02Z)
Learning What Makes a Difference from Counterfactual Examples and Gradient Supervision [57.14468881854616]
We propose an auxiliary training objective that improves the generalization capabilities of neural networks. We use pairs of minimally-different examples with different labels, a.k.a counterfactual or contrasting examples, which provide a signal indicative of the underlying causal structure of the task. Models trained with this technique demonstrate improved performance on out-of-distribution test sets.
arXiv Detail & Related papers (2020-04-20T02:47:49Z)

This list is automatically generated from the titles and abstracts of the papers in this site.