Related papers: Understanding LLM Behaviors via Compression: Data Generation, Knowledge Acquisition and Scaling Laws

Understanding LLM Behaviors via Compression: Data Generation, Knowledge Acquisition and Scaling Laws

URL: http://arxiv.org/abs/2504.09597v4
Date: Tue, 22 Apr 2025 14:11:33 GMT
Title: Understanding LLM Behaviors via Compression: Data Generation, Knowledge Acquisition and Scaling Laws
Authors: Zhixuan Pan, Shaowen Wang, Jian Li,
Abstract summary: We offer a detailed view of how Large Language Models acquire and store information across increasing model and data scales.<n>Motivated by this theoretical perspective and natural assumptions inspired by Heap's and Zipf's laws, we introduce a simplified yet representative hierarchical data-generation framework.<n>Under the Bayesian setting, we show that prediction and compression within this model naturally lead to diverse learning and scaling behaviors.
Score: 5.685201910521295
License: http://creativecommons.org/licenses/by/4.0/
Abstract: Large Language Models (LLMs) have demonstrated remarkable capabilities across numerous tasks, yet principled explanations for their underlying mechanisms and several phenomena, such as scaling laws, hallucinations, and related behaviors, remain elusive. In this work, we revisit the classical relationship between compression and prediction, grounded in Kolmogorov complexity and Shannon information theory, to provide deeper insights into LLM behaviors. By leveraging the Kolmogorov Structure Function and interpreting LLM compression as a two-part coding process, we offer a detailed view of how LLMs acquire and store information across increasing model and data scales -- from pervasive syntactic patterns to progressively rarer knowledge elements. Motivated by this theoretical perspective and natural assumptions inspired by Heap's and Zipf's laws, we introduce a simplified yet representative hierarchical data-generation framework called the Syntax-Knowledge model. Under the Bayesian setting, we show that prediction and compression within this model naturally lead to diverse learning and scaling behaviors of LLMs. In particular, our theoretical analysis offers intuitive and principled explanations for both data and model scaling laws, the dynamics of knowledge acquisition during training and fine-tuning, factual knowledge hallucinations in LLMs. The experimental results validate our theoretical predictions.

Related papers

Large Language Models as Computable Approximations to Solomonoff Induction [11.811838796672369]
We establish the first formal connection between large language models (LLMs) and Algorithmic Information Theory (AIT)<n>We leverage AIT to provide a unified theoretical explanation for in-context learning, few-shot learning, and scaling laws.<n>Our framework bridges the gap between theoretical foundations and practical LLM behaviors, providing both explanatory power and actionable insights for future model development.
arXiv Detail & Related papers (2025-05-21T17:35:08Z)
I Predict Therefore I Am: Is Next Token Prediction Enough to Learn Human-Interpretable Concepts from Data? [79.01538178959726]
Large language models (LLMs) have led many to conclude that they exhibit a form of intelligence.<n>We introduce a novel generative model that generates tokens on the basis of human interpretable concepts represented as latent discrete variables.
arXiv Detail & Related papers (2025-03-12T01:21:17Z)
Thinking with Knowledge Graphs: Enhancing LLM Reasoning Through Structured Data [0.9284740716447338]
Large Language Models (LLMs) have demonstrated remarkable capabilities in natural language understanding and generation.<n>Recent research has shown promising results in leveraging knowledge graphs (KGs) to enhance LLM performance.<n>We have developed different techniques that tightly integrate KG structures and semantics into LLM representations.
arXiv Detail & Related papers (2024-12-14T02:51:47Z)
Large Language Models as Markov Chains [7.078696932669912]
We draw an equivalence between autoregressive transformer-based language models and Markov chains defined on a finite state space. We relate the obtained results to the pathological behavior observed with LLMs. Experiments with the most recent Llama and Gemma herds of models show that our theory correctly captures their behavior in practice.
arXiv Detail & Related papers (2024-10-03T17:45:31Z)
Unveiling LLMs: The Evolution of Latent Representations in a Dynamic Knowledge Graph [15.129079475322637]
This work unveils the factual information an Large Language Models represents internally for sentence-level claim verification. We propose an end-to-end framework to decode factual knowledge embedded in token representations from a vector space to a set of ground predicates. Our framework employs activation patching, a vector-level technique that alters a token representation during inference, to extract encoded knowledge.
arXiv Detail & Related papers (2024-04-04T17:45:59Z)
Characterizing Truthfulness in Large Language Model Generations with Local Intrinsic Dimension [63.330262740414646]
We study how to characterize and predict the truthfulness of texts generated from large language models (LLMs) We suggest investigating internal activations and quantifying LLM's truthfulness using the local intrinsic dimension (LID) of model activations.
arXiv Detail & Related papers (2024-02-28T04:56:21Z)
Learning to Generate Explainable Stock Predictions using Self-Reflective Large Language Models [54.21695754082441]
We propose a framework to teach Large Language Models (LLMs) to generate explainable stock predictions. A reflective agent learns how to explain past stock movements through self-reasoning, while the PPO trainer trains the model to generate the most likely explanations. Our framework can outperform both traditional deep-learning and LLM methods in prediction accuracy and Matthews correlation coefficient.
arXiv Detail & Related papers (2024-02-06T03:18:58Z)
From Understanding to Utilization: A Survey on Explainability for Large Language Models [27.295767173801426]
This survey underscores the imperative for increased explainability in Large Language Models (LLMs) Our focus is primarily on pre-trained Transformer-based LLMs, which pose distinctive interpretability challenges due to their scale and complexity. When considering the utilization of explainability, we explore several compelling methods that concentrate on model editing, control generation, and model enhancement.
arXiv Detail & Related papers (2024-01-23T16:09:53Z)
Explanation-aware Soft Ensemble Empowers Large Language Model In-context Learning [50.00090601424348]
Large language models (LLMs) have shown remarkable capabilities in various natural language understanding tasks. We propose EASE, an Explanation-Aware Soft Ensemble framework to empower in-context learning with LLMs.
arXiv Detail & Related papers (2023-11-13T06:13:38Z)
Explainability for Large Language Models: A Survey [59.67574757137078]
Large language models (LLMs) have demonstrated impressive capabilities in natural language processing. This paper introduces a taxonomy of explainability techniques and provides a structured overview of methods for explaining Transformer-based language models.
arXiv Detail & Related papers (2023-09-02T22:14:26Z)
An Empirical Study of Catastrophic Forgetting in Large Language Models During Continual Fine-tuning [70.48605869773814]
Catastrophic forgetting (CF) is a phenomenon that occurs in machine learning when a model forgets previously learned information. This study empirically evaluates the forgetting phenomenon in large language models during continual instruction tuning.
arXiv Detail & Related papers (2023-08-17T02:53:23Z)

This list is automatically generated from the titles and abstracts of the papers in this site.