Forget BIT, It is All about TOKEN: Towards Semantic Information Theory for LLMs
- URL: http://arxiv.org/abs/2511.01202v1
- Date: Mon, 03 Nov 2025 03:56:34 GMT
- Title: Forget BIT, It is All about TOKEN: Towards Semantic Information Theory for LLMs
- Authors: Bo Bai,
- Abstract summary: Large language models (LLMs) have demonstrated remarkable capabilities in numerous real-world applications.<n>How to open the black-box of LLMs from a theoretical standpoint has become a critical challenge.<n>This paper takes the theory of rate-distortion function, directed information, and Granger causality as its starting point.
- Score: 7.26032677670473
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Large language models (LLMs) have demonstrated remarkable capabilities in numerous real-world applications. While the vast majority of research conducted from an experimental perspective is progressing rapidly, it demands substantial computational power, data, and other resources. Therefore, how to open the black-box of LLMs from a theoretical standpoint has become a critical challenge. This paper takes the theory of rate-distortion function, directed information, and Granger causality as its starting point to investigate the information-theoretic principles behind LLMs, leading to the development of semantic information theory for LLMs, where the fundamental unit is token, rather than bits that lacks any semantic meaning. By defining the probabilistic model of LLMs, we discuss structure-agnostic information-theoretic measures, such as the directed rate-distortion function in pre-training, the directed rate-reward function in post-training, and the semantic information flow in inference phase. This paper also delves deeply into the theory of token-level semantic embedding and the information-theoretically optimal vectorization method. Thereafter, we propose a general definition of autoregression LLM, where the Transformer architecture and its performance such as ELBO, generalization error bound, memory capacity, and semantic information measures can be derived theoretically. Other architectures, such as Mamba/Mamba2 and LLaDA, are also discussed in our framework. Consequently, this paper provides a theoretical framework for understanding LLMs from the perspective of semantic information theory, which also offers the necessary theoretical tools for further in-depth research.
Related papers
- Revisiting LLM Reasoning via Information Bottleneck [57.519119962528166]
Large language models (LLMs) have recently demonstrated remarkable progress in reasoning capabilities through reinforcement learning with verifiable rewards (RLVR)<n>We present a theoretical characterization of LLM reasoning grounded in information bottleneck (IB) principle.<n>We propose IB-aware reasoning optimization (IBRO), a framework that encourages reasoning trajectories to be both informative about the final correct answer and generalizable.
arXiv Detail & Related papers (2025-07-24T13:14:25Z) - Large Language Models as Computable Approximations to Solomonoff Induction [11.811838796672369]
We establish the first formal connection between large language models (LLMs) and Algorithmic Information Theory (AIT)<n>We leverage AIT to provide a unified theoretical explanation for in-context learning, few-shot learning, and scaling laws.<n>Our framework bridges the gap between theoretical foundations and practical LLM behaviors, providing both explanatory power and actionable insights for future model development.
arXiv Detail & Related papers (2025-05-21T17:35:08Z) - Understanding LLM Behaviors via Compression: Data Generation, Knowledge Acquisition and Scaling Laws [5.685201910521295]
We offer a detailed view of how Large Language Models acquire and store information across increasing model and data scales.<n>Motivated by this theoretical perspective and natural assumptions inspired by Heap's and Zipf's laws, we introduce a simplified yet representative hierarchical data-generation framework.<n>Under the Bayesian setting, we show that prediction and compression within this model naturally lead to diverse learning and scaling behaviors.
arXiv Detail & Related papers (2025-04-13T14:31:52Z) - How do Large Language Models Understand Relevance? A Mechanistic Interpretability Perspective [64.00022624183781]
Large language models (LLMs) can assess relevance and support information retrieval (IR) tasks.<n>We investigate how different LLM modules contribute to relevance judgment through the lens of mechanistic interpretability.
arXiv Detail & Related papers (2025-04-10T16:14:55Z) - LightPROF: A Lightweight Reasoning Framework for Large Language Model on Knowledge Graph [57.382255728234064]
Large Language Models (LLMs) have impressive capabilities in text understanding and zero-shot reasoning.<n> Knowledge Graphs (KGs) provide rich and reliable contextual information for the reasoning process of LLMs.<n>We propose a novel Lightweight and efficient Prompt learning-ReasOning Framework for KGQA (LightPROF)
arXiv Detail & Related papers (2025-04-04T03:03:47Z) - Argumentation Computation with Large Language Models : A Benchmark Study [6.0682923348298194]
Large language models (LLMs) have made significant advancements in neuro-symbolic computing.<n>We aim to investigate the capability of LLMs in determining the extensions of various abstract argumentation semantics.
arXiv Detail & Related papers (2024-12-21T18:23:06Z) - Understanding Chain-of-Thought in LLMs through Information Theory [16.78730663293352]
We formalize Chain-of-Thought (CoT) reasoning in Large Language Models (LLMs) through an information-theoretic lens.<n>Specifically, our framework quantifies the information-gain' at each reasoning step, enabling the identification of failure modes.<n>We demonstrate the efficacy of our approach through extensive experiments on toy arithmetic, GSM8K and PRM800k datasets.
arXiv Detail & Related papers (2024-11-18T19:14:36Z) - A Survey on Efficient Inference for Large Language Models [25.572035747669275]
Large Language Models (LLMs) have attracted extensive attention due to their remarkable performance across various tasks.
The substantial computational and memory requirements of LLM inference pose challenges for deployment in resource-constrained scenarios.
This paper presents a comprehensive survey of the existing literature on efficient LLM inference.
arXiv Detail & Related papers (2024-04-22T15:53:08Z) - Characterizing Truthfulness in Large Language Model Generations with
Local Intrinsic Dimension [63.330262740414646]
We study how to characterize and predict the truthfulness of texts generated from large language models (LLMs)
We suggest investigating internal activations and quantifying LLM's truthfulness using the local intrinsic dimension (LID) of model activations.
arXiv Detail & Related papers (2024-02-28T04:56:21Z) - Towards Optimal Learning of Language Models [124.65669486710992]
We present a theory for the optimal learning of language models (LMs)
We derive a theorem, named Learning Law, to reveal the properties of the dynamics in the optimal learning process under our objective.
We empirically verify that the optimal learning of LMs essentially stems from the improvement of the coefficients in the scaling law of LMs.
arXiv Detail & Related papers (2024-02-27T18:52:19Z) - Learning to Generate Explainable Stock Predictions using Self-Reflective
Large Language Models [54.21695754082441]
We propose a framework to teach Large Language Models (LLMs) to generate explainable stock predictions.
A reflective agent learns how to explain past stock movements through self-reasoning, while the PPO trainer trains the model to generate the most likely explanations.
Our framework can outperform both traditional deep-learning and LLM methods in prediction accuracy and Matthews correlation coefficient.
arXiv Detail & Related papers (2024-02-06T03:18:58Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.