Sparse Semantic Dimension as a Generalization Certificate for LLMs
- URL: http://arxiv.org/abs/2602.11388v1
- Date: Wed, 11 Feb 2026 21:45:18 GMT
- Title: Sparse Semantic Dimension as a Generalization Certificate for LLMs
- Authors: Dibyanayan Bandyopadhyay, Asif Ekbal,
- Abstract summary: We introduce the Sparse Semantic Dimension (SSD), a complexity measure derived from the active feature vocabulary of a Sparse Autoencoder (SAE) trained on the model's layers.<n>We validate this framework on GPT-2 Small and Gemma-2B, demonstrating that our bound provides non-vacuous certificates at realistic sample sizes.
- Score: 53.681678236115836
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Standard statistical learning theory predicts that Large Language Models (LLMs) should overfit because their parameter counts vastly exceed the number of training tokens. Yet, in practice, they generalize robustly. We propose that the effective capacity controlling generalization lies in the geometry of the model's internal representations: while the parameter space is high-dimensional, the activation states lie on a low-dimensional, sparse manifold. To formalize this, we introduce the Sparse Semantic Dimension (SSD), a complexity measure derived from the active feature vocabulary of a Sparse Autoencoder (SAE) trained on the model's layers. Treating the LLM and SAE as frozen oracles, we utilize this framework to attribute the model's generalization capabilities to the sparsity of the dictionary rather than the total parameter count. Empirically, we validate this framework on GPT-2 Small and Gemma-2B, demonstrating that our bound provides non-vacuous certificates at realistic sample sizes. Crucially, we uncover a counter-intuitive "feature sharpness" scaling law: despite being an order of magnitude larger, Gemma-2B requires significantly fewer calibration samples to identify its active manifold compared to GPT-2, suggesting that larger models learn more compressible, distinct semantic structures. Finally, we show that this framework functions as a reliable safety monitor: out-of-distribution inputs trigger a measurable "feature explosion" (a sharp spike in active features), effectively signaling epistemic uncertainty through learned feature violation. Code is available at: https://github.com/newcodevelop/sparse-semantic-dimension.
Related papers
- Rethinking LLM-as-a-Judge: Representation-as-a-Judge with Small Language Models via Semantic Capacity Asymmetry [41.26991813225211]
We investigate whether smaller models can serve as efficient evaluators by leveraging internal representations instead of surface generation.<n>We propose the Semantic Capacity Asymmetry Hypothesis: evaluation requires significantly less semantic capacity than generation.<n>We instantiate this paradigm through INSPECTOR, a probing-based framework that predicts aspect-level evaluation scores from small model representations.
arXiv Detail & Related papers (2026-01-30T05:34:24Z) - Binary Autoencoder for Mechanistic Interpretability of Large Language Models [8.725176890854065]
We propose a novel Binary Autoencoder variant that enforces minimal entropy on minibatches of hidden activations.<n>For efficient entropy calculation, we discretize the hidden activations to 1-bit via a step function.<n>We empirically evaluate and leverage to characterize the inference dynamics of Large Language Models.
arXiv Detail & Related papers (2025-09-25T10:48:48Z) - Quantization Meets dLLMs: A Systematic Study of Post-training Quantization for Diffusion LLMs [78.09559830840595]
We present the first systematic study on quantizing diffusion-based language models.<n>We identify the presence of activation outliers, characterized by abnormally large activation values.<n>We implement state-of-the-art PTQ methods and conduct a comprehensive evaluation.
arXiv Detail & Related papers (2025-08-20T17:59:51Z) - Semantic Convergence: Investigating Shared Representations Across Scaled LLMs [4.172347145536457]
Large language models carve the world into broadly similar, interpretable features despite size differences, reinforcing universality as a foundation for cross-model interpretability.<n>Preliminary experiments extend the analysis from single tokens to multi-token subspaces, showing that semantically similar subspaces interact similarly with language models.
arXiv Detail & Related papers (2025-07-21T07:09:32Z) - Franca: Nested Matryoshka Clustering for Scalable Visual Representation Learning [30.590869749117815]
Franca is the first fully open-source (data, code, weights) vision foundation model.<n>It matches and in many cases surpasses the performance of state-of-the-art proprietary models.<n>Our contributions establish a new standard for transparent, high-performance vision models.
arXiv Detail & Related papers (2025-07-18T17:59:55Z) - Evaluating and Designing Sparse Autoencoders by Approximating Quasi-Orthogonality [3.9230690073443166]
We introduce a novel activation function, top-AFA, which builds upon our formulation of approximate feature activation (AFA)<n>By training SAEs on three intermediate layers to reconstruct GPT2 hidden embeddings for over 80 million tokens from the OpenWebText dataset, we demonstrate the empirical merits of this approach.
arXiv Detail & Related papers (2025-03-31T16:22:11Z) - Predicting the Performance of Black-box LLMs through Self-Queries [60.87193950962585]
Large language models (LLMs) are increasingly relied on in AI systems, predicting when they make mistakes is crucial.<n>In this paper, we extract features of LLMs in a black-box manner by using follow-up prompts and taking the probabilities of different responses as representations.<n>We demonstrate that training a linear model on these low-dimensional representations produces reliable predictors of model performance at the instance level.
arXiv Detail & Related papers (2025-01-02T22:26:54Z) - Quantifying Semantic Emergence in Language Models [31.608080868988825]
Large language models (LLMs) are widely recognized for their exceptional capacity to capture semantics meaning.<n>In this work, we introduce a quantitative metric, Information Emergence (IE), designed to measure LLMs' ability to extract semantics from input tokens.
arXiv Detail & Related papers (2024-05-21T09:12:20Z) - Non-Vacuous Generalization Bounds for Large Language Models [78.42762571499061]
We provide the first non-vacuous generalization bounds for pretrained large language models.
We show that larger models have better generalization bounds and are more compressible than smaller models.
arXiv Detail & Related papers (2023-12-28T17:58:42Z) - SEER-ZSL: Semantic Encoder-Enhanced Representations for Generalized Zero-Shot Learning [0.6792605600335813]
Zero-Shot Learning (ZSL) presents the challenge of identifying categories not seen during training.<n>We introduce a Semantic-Enhanced Representations for Zero-Shot Learning (SEER-ZSL)<n>First, we aim to distill meaningful semantic information using a probabilistic encoder, enhancing the semantic consistency and robustness.<n>Second, we distill the visual space by exploiting the learned data distribution through an adversarially trained generator. Third, we align the distilled information, enabling a mapping of unseen categories onto the true data manifold.
arXiv Detail & Related papers (2023-12-20T15:18:51Z) - Disentanglement via Latent Quantization [60.37109712033694]
In this work, we construct an inductive bias towards encoding to and decoding from an organized latent space.
We demonstrate the broad applicability of this approach by adding it to both basic data-re (vanilla autoencoder) and latent-reconstructing (InfoGAN) generative models.
arXiv Detail & Related papers (2023-05-28T06:30:29Z) - Interpretability at Scale: Identifying Causal Mechanisms in Alpaca [62.65877150123775]
We use Boundless DAS to efficiently search for interpretable causal structure in large language models while they follow instructions.
Our findings mark a first step toward faithfully understanding the inner-workings of our ever-growing and most widely deployed language models.
arXiv Detail & Related papers (2023-05-15T17:15:40Z) - Feature Re-calibration based MIL for Whole Slide Image Classification [7.92885032436243]
Whole slide image (WSI) classification is a fundamental task for the diagnosis and treatment of diseases.
We propose to re-calibrate the distribution of a WSI bag (instances) by using the statistics of the max-instance (critical) feature.
We employ a position encoding module (PEM) to model spatial/morphological information, and perform pooling by multi-head self-attention (PSMA) with a Transformer encoder.
arXiv Detail & Related papers (2022-06-22T07:00:39Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.