Related papers: Correlation Dimension of Auto-Regressive Large Language Models

Correlation Dimension of Auto-Regressive Large Language Models

URL: http://arxiv.org/abs/2510.21258v1
Date: Fri, 24 Oct 2025 08:42:23 GMT
Title: Correlation Dimension of Auto-Regressive Large Language Models
Authors: Xin Du, Kumiko Tanaka-Ishii,
Abstract summary: Large language models (LLMs) have achieved remarkable progress in natural language generation.<n>They continue to display puzzling behaviors, such as repetition and incoherence, even when exhibiting low perplexity.<n>We introduce correlation dimension, a fractal-geometric measure of self-similarity, to quantify complexity of text.
Score: 11.183390901786659
License: http://creativecommons.org/licenses/by/4.0/
Abstract: Large language models (LLMs) have achieved remarkable progress in natural language generation, yet they continue to display puzzling behaviors -- such as repetition and incoherence -- even when exhibiting low perplexity. This highlights a key limitation of conventional evaluation metrics, which emphasize local prediction accuracy while overlooking long-range structural complexity. We introduce correlation dimension, a fractal-geometric measure of self-similarity, to quantify the epistemological complexity of text as perceived by a language model. This measure captures the hierarchical recurrence structure of language, bridging local and global properties in a unified framework. Through extensive experiments, we show that correlation dimension (1) reveals three distinct phases during pretraining, (2) reflects context-dependent complexity, (3) indicates a model's tendency toward hallucination, and (4) reliably detects multiple forms of degeneration in generated text. The method is computationally efficient, robust to model quantization (down to 4-bit precision), broadly applicable across autoregressive architectures (e.g., Transformer and Mamba), and provides fresh insight into the generative dynamics of LLMs.

Related papers

The Statistical Signature of LLMs [1.3135750017147134]
We show that a simple, model-agnostic measure of statistical regularity differentiates generative regimes directly from surface text.<n>Across settings, compression reveals a persistent structural signature of probabilistic generation.<n>Our findings introduce a simple and robust framework for quantifying how generative systems reshape textual production.
arXiv Detail & Related papers (2026-02-20T11:33:37Z)
Semantic Chunking and the Entropy of Natural Language [1.3592625530347717]
The entropy rate of printed English is famously estimated to be about one bit per character.<n>We introduce a statistical model that attempts to capture the intricate multi-scale structure of natural language.
arXiv Detail & Related papers (2026-02-13T18:58:10Z)
UniT: Unified Multimodal Chain-of-Thought Test-time Scaling [85.590774707406]
Unified models can handle both multimodal understanding and generation within a single architecture, yet they typically operate in a single pass without iteratively refining their outputs.<n>We introduce UniT, a framework for multimodal test-time scaling that enables a single unified model to reason, verify, and refine across multiple rounds.
arXiv Detail & Related papers (2026-02-12T18:59:49Z)
Bridging Temporal and Textual Modalities: A Multimodal Framework for Automated Cloud Failure Root Cause Analysis [0.0]
This paper presents a diagnostic framework that harmonizes time-series representations with pretrained language model embedding spaces.<n>Our framework achieves leading performance, reaching 48.75% diagnostic accuracy with notable improvements on scenarios involving compound failure modes.
arXiv Detail & Related papers (2026-01-08T08:20:44Z)
Structure-Aware Decoding Mechanisms for Complex Entity Extraction with Large-Scale Language Models [8.15127799301814]
This paper proposes a structure-aware decoding method based on large language models.<n>It addresses the difficulty of maintaining both semantic integrity and structural consistency in nested and overlapping entity extraction tasks.<n> Experiments conducted on the ACE 2005 dataset demonstrate significant improvements in Accuracy, Precision, Recall, and F1-Score.
arXiv Detail & Related papers (2025-12-16T00:40:06Z)
Scaling Laws and Representation Learning in Simple Hierarchical Languages: Transformers vs. Convolutional Architectures [49.19753720526998]
We derive theoretical scaling laws for neural network performance on synthetic datasets.<n>We validate that convolutional networks, whose structure aligns with that of the generative process through locality and weight sharing, enjoy a faster scaling of performance.<n>This finding clarifies the architectural biases underlying neural scaling laws and highlights how representation learning is shaped by the interaction between model architecture and the statistical properties of data.
arXiv Detail & Related papers (2025-05-11T17:44:14Z)
Towards Human Cognition: Visual Context Guides Syntactic Priming in Fusion-Encoded Models [3.63819860423174]
Structural priming is a cognitive phenomenon where exposure to a particular syntactic structure increases the likelihood of producing the same structure in subsequent utterances.<n>We introduce PRISMATIC, the first multimodal structural priming dataset.<n>We propose the Syntactic Preservation Index (SPI), a novel reference-free evaluation metric designed specifically to assess structural priming effects in sentence level.
arXiv Detail & Related papers (2025-02-24T21:33:27Z)
How Compositional Generalization and Creativity Improve as Diffusion Models are Trained [82.08869888944324]
How many samples do generative models need in order to learn composition rules?<n>What signal in the data is exploited to learn those rules?<n>We discuss connections between the hierarchical clustering mechanism we introduce here and the renormalization group in physics.
arXiv Detail & Related papers (2025-02-17T18:06:33Z)
Modeling Bilingual Sentence Processing: Evaluating RNN and Transformer Architectures for Cross-Language Structural Priming [10.292557971996112]
This study evaluates the performance of Recurrent Neural Network (RNN) and Transformer models in replicating cross-language structural priming. Our findings indicate that transformers outperform RNNs in generating primed sentence structures. This work contributes to our understanding of how computational models may reflect human cognitive processes across diverse language families.
arXiv Detail & Related papers (2024-05-15T17:01:02Z)
Explaining Text Similarity in Transformer Models [52.571158418102584]
Recent advances in explainable AI have made it possible to mitigate limitations by leveraging improved explanations for Transformers. We use BiLRP, an extension developed for computing second-order explanations in bilinear similarity models, to investigate which feature interactions drive similarity in NLP models. Our findings contribute to a deeper understanding of different semantic similarity tasks and models, highlighting how novel explainable AI methods enable in-depth analyses and corpus-level insights.
arXiv Detail & Related papers (2024-05-10T17:11:31Z)
Cross-Lingual Transfer of Cognitive Processing Complexity [11.939409227407769]
We use sentence-level eye-tracking patterns as a cognitive indicator for structural complexity. We show that the multilingual model XLM-RoBERTa can successfully predict varied patterns for 13 typologically diverse languages.
arXiv Detail & Related papers (2023-02-24T15:48:23Z)
DIFFormer: Scalable (Graph) Transformers Induced by Energy Constrained Diffusion [66.21290235237808]
We introduce an energy constrained diffusion model which encodes a batch of instances from a dataset into evolutionary states. We provide rigorous theory that implies closed-form optimal estimates for the pairwise diffusion strength among arbitrary instance pairs. Experiments highlight the wide applicability of our model as a general-purpose encoder backbone with superior performance in various tasks.
arXiv Detail & Related papers (2023-01-23T15:18:54Z)
Model Criticism for Long-Form Text Generation [113.13900836015122]
We apply a statistical tool, model criticism in latent space, to evaluate the high-level structure of generated text. We perform experiments on three representative aspects of high-level discourse -- coherence, coreference, and topicality. We find that transformer-based language models are able to capture topical structures but have a harder time maintaining structural coherence or modeling coreference.
arXiv Detail & Related papers (2022-10-16T04:35:58Z)
TAGPRIME: A Unified Framework for Relational Structure Extraction [71.88926365652034]
TAGPRIME is a sequence tagging model that appends priming words about the information of the given condition to the input text. With the self-attention mechanism in pre-trained language models, the priming words make the output contextualized representations contain more information about the given condition. Extensive experiments and analyses on three different tasks that cover ten datasets across five different languages demonstrate the generality and effectiveness of TAGPRIME.
arXiv Detail & Related papers (2022-05-25T08:57:46Z)

This list is automatically generated from the titles and abstracts of the papers in this site.