Related papers: Structure and Redundancy in Large Language Models: A Spectral Study via Random Matrix Theory

Structure and Redundancy in Large Language Models: A Spectral Study via Random Matrix Theory

URL: http://arxiv.org/abs/2602.22345v1
Date: Wed, 25 Feb 2026 19:11:56 GMT
Title: Structure and Redundancy in Large Language Models: A Spectral Study via Random Matrix Theory
Authors: Davide Ettori,
Abstract summary: This thesis addresses two persistent and closely related challenges in modern deep learning, reliability and efficiency.<n>By analyzing the eigenvalue dynamics of hidden activations across layers and inputs, this work shows that spectral statistics provide a compact, stable, and interpretable lens on model behavior.<n>Within this framework, the first contribution, EigenTrack, introduces a real-time method for detecting hallucinations and out-of-distribution behavior in large language and vision-language models.<n>The second contribution, RMT-KD, presents a principled approach to compressing deep networks via random matrix theoretic knowledge distillation.
Score: 0.0
License: http://creativecommons.org/licenses/by/4.0/
Abstract: This thesis addresses two persistent and closely related challenges in modern deep learning, reliability and efficiency, through a unified framework grounded in Spectral Geometry and Random Matrix Theory (RMT). As deep networks and large language models continue to scale, their internal behavior becomes increasingly opaque, leading to hallucinations, fragile generalization under distribution shift, and growing computational and energy demands. By analyzing the eigenvalue dynamics of hidden activations across layers and inputs, this work shows that spectral statistics provide a compact, stable, and interpretable lens on model behavior, capable of separating structured, causal representations from noise-dominated variability. Within this framework, the first contribution, EigenTrack, introduces a real-time method for detecting hallucinations and out-of-distribution behavior in large language and vision-language models. EigenTrack transforms streaming activations into spectral descriptors such as entropy, variance, and deviations from the Marchenko-Pastur baseline, and models their temporal evolution using lightweight recurrent classifiers, enabling early detection of reliability failures before they appear in model outputs while offering interpretable insight into representation dynamics. The second contribution, RMT-KD, presents a principled approach to compressing deep networks via random matrix theoretic knowledge distillation. By interpreting outlier eigenvalues in activation spectra as carriers of task-relevant information, RMT-KD progressively projects networks onto lower-dimensional subspaces through iterative self-distillation, yielding significantly more compact and energy-efficient models while preserving accuracy and dense, hardware-friendly structure.

Related papers

Spectral Geometry for Deep Learning: Compression and Hallucination Detection via Random Matrix Theory [0.0]
This thesis proposes a unified framework based on spectral geometry and random matrix theory to address both problems.<n>The first contribution, EigenTrack, is a real-time method for detecting hallucinations and out-of-distribution behavior in language and vision-language models.<n>The second contribution, RMT-KD, is a principled compression method that identifies informative spectral components.
arXiv Detail & Related papers (2026-01-24T08:07:22Z)
Disordered Dynamics in High Dimensions: Connections to Random Matrices and Machine Learning [52.26396748560348]
We provide an overview of high dimensional dynamical systems driven by random matrices.<n>We focus on applications to simple models of learning and generalization in machine learning theory.
arXiv Detail & Related papers (2026-01-03T00:12:32Z)
A PID-Controlled Tensor Wheel Decomposition Model for Dynamic Link Prediction [3.525733859925913]
This study introduces a PID-controlled tensor wheel decomposition (PTWD) model, which mainly adopts the following two ideas.<n>The proposed PTWD model has more accurate link prediction capabilities compared to other models.
arXiv Detail & Related papers (2025-05-20T11:14:30Z)
Model Hemorrhage and the Robustness Limits of Large Language Models [119.46442117681147]
Large language models (LLMs) demonstrate strong performance across natural language processing tasks, yet undergo significant performance degradation when modified for deployment.<n>We define this phenomenon as model hemorrhage - performance decline caused by parameter alterations and architectural changes.
arXiv Detail & Related papers (2025-03-31T10:16:03Z)
Multi-Head Self-Attending Neural Tucker Factorization [5.734615417239977]
We introduce a neural network-based tensor factorization approach tailored for learning representations of high-dimensional and incomplete (HDI) tensors.<n>The proposed MSNTucF model demonstrates superior performance compared to state-of-the-art benchmark models in estimating missing observations.
arXiv Detail & Related papers (2025-01-16T13:04:15Z)
Theoretical Foundations of Deep Selective State-Space Models [13.971499161967083]
Deep SSMs demonstrate outstanding performance across a diverse set of domains.<n>Recent developments show that if the linear recurrence powering SSMs allows for multiplicative interactions between inputs and hidden states.<n>We show that when random linear recurrences are equipped with simple input-controlled transitions, then the hidden state is provably a low-dimensional projection of a powerful mathematical object.
arXiv Detail & Related papers (2024-02-29T11:20:16Z)
Disentanglement via Latent Quantization [60.37109712033694]
In this work, we construct an inductive bias towards encoding to and decoding from an organized latent space. We demonstrate the broad applicability of this approach by adding it to both basic data-re (vanilla autoencoder) and latent-reconstructing (InfoGAN) generative models.
arXiv Detail & Related papers (2023-05-28T06:30:29Z)
DIFFormer: Scalable (Graph) Transformers Induced by Energy Constrained Diffusion [66.21290235237808]
We introduce an energy constrained diffusion model which encodes a batch of instances from a dataset into evolutionary states. We provide rigorous theory that implies closed-form optimal estimates for the pairwise diffusion strength among arbitrary instance pairs. Experiments highlight the wide applicability of our model as a general-purpose encoder backbone with superior performance in various tasks.
arXiv Detail & Related papers (2023-01-23T15:18:54Z)
Deep Neural Dynamic Bayesian Networks applied to EEG sleep spindles modeling [0.0]
We propose a generative model for single-channel EEG that incorporates the constraints experts actively enforce during visual scoring. We derive algorithms for exact, tractable inference as a special case of Generalized Expectation Maximization. We validate the model on three public datasets and provide support that more complex models are able to surpass state-of-the-art detectors.
arXiv Detail & Related papers (2020-10-16T21:48:29Z)
Multiplicative noise and heavy tails in stochastic optimization [62.993432503309485]
empirical optimization is central to modern machine learning, but its role in its success is still unclear. We show that it commonly arises in parameters of discrete multiplicative noise due to variance. A detailed analysis is conducted in which we describe on key factors, including recent step size, and data, all exhibit similar results on state-of-the-art neural network models.
arXiv Detail & Related papers (2020-06-11T09:58:01Z)
Kernel and Rich Regimes in Overparametrized Models [69.40899443842443]
We show that gradient descent on overparametrized multilayer networks can induce rich implicit biases that are not RKHS norms. We also demonstrate this transition empirically for more complex matrix factorization models and multilayer non-linear networks.
arXiv Detail & Related papers (2020-02-20T15:43:02Z)

This list is automatically generated from the titles and abstracts of the papers in this site.

This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.