Related papers: I Predict Therefore I Am: Is Next Token Prediction Enough to Learn Human-Interpretable Concepts from Data?

I Predict Therefore I Am: Is Next Token Prediction Enough to Learn Human-Interpretable Concepts from Data?

URL: http://arxiv.org/abs/2503.08980v2
Date: Mon, 14 Apr 2025 11:00:31 GMT
Title: I Predict Therefore I Am: Is Next Token Prediction Enough to Learn Human-Interpretable Concepts from Data?
Authors: Yuhang Liu, Dong Gong, Erdun Gao, Zhen Zhang, Biwei Huang, Mingming Gong, Anton van den Hengel, Javen Qinfeng Shi,
Abstract summary: Large language models (LLMs) have led many to conclude that they exhibit a form of intelligence.<n>We introduce a novel generative model that generates tokens on the basis of human interpretable concepts represented as latent discrete variables.
Score: 79.01538178959726
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: The remarkable achievements of large language models (LLMs) have led many to conclude that they exhibit a form of intelligence. This is as opposed to explanations of their capabilities based on their ability to perform relatively simple manipulations of vast volumes of data. To illuminate the distinction between these explanations, we introduce a novel generative model that generates tokens on the basis of human interpretable concepts represented as latent discrete variables. Under mild conditions, even when the mapping from the latent space to the observed space is non-invertible, we establish an identifiability result: the representations learned by LLMs through next-token prediction can be approximately modeled as the logarithm of the posterior probabilities of these latent discrete concepts, up to an invertible linear transformation. This theoretical finding not only provides evidence that LLMs capture underlying generative factors, but also strongly reinforces the linear representation hypothesis, which posits that LLMs learn linear representations of human-interpretable concepts. Empirically, we validate our theoretical results through evaluations on both simulation data and the Pythia, Llama, and DeepSeek model families.

Related papers

Understanding LLM Behaviors via Compression: Data Generation, Knowledge Acquisition and Scaling Laws [5.685201910521295]
We offer a detailed view of how Large Language Models acquire and store information across increasing model and data scales. Motivated by this theoretical perspective and natural assumptions inspired by Heap's and Zipf's laws, we introduce a simplified yet representative hierarchical data-generation framework. Under the Bayesian setting, we show that prediction and compression within this model naturally lead to diverse learning and scaling behaviors.
arXiv Detail & Related papers (2025-04-13T14:31:52Z)
Cross-Entropy Is All You Need To Invert the Data Generating Process [29.94396019742267]
Empirical phenomena suggest that supervised models can learn interpretable factors of variation in a linear fashion.<n>Recent advances in self-supervised learning have shown that these methods can recover latent structures by inverting the data generating process.<n>We prove that even in standard classification tasks, models learn representations of ground-truth factors of variation up to a linear transformation.
arXiv Detail & Related papers (2024-10-29T09:03:57Z)
Learning Discrete Concepts in Latent Hierarchical Models [73.01229236386148]
Learning concepts from natural high-dimensional data holds potential in building human-aligned and interpretable machine learning models.<n>We formalize concepts as discrete latent causal variables that are related via a hierarchical causal model.<n>We substantiate our theoretical claims with synthetic data experiments.
arXiv Detail & Related papers (2024-06-01T18:01:03Z)
On the Origins of Linear Representations in Large Language Models [51.88404605700344]
We introduce a simple latent variable model to formalize the concept dynamics of the next token prediction. Experiments show that linear representations emerge when learning from data matching the latent variable model. We additionally confirm some predictions of the theory using the LLaMA-2 large language model.
arXiv Detail & Related papers (2024-03-06T17:17:36Z)
The Information of Large Language Model Geometry [3.4003124816653143]
We conduct simulations to analyze the representation entropy and discover a power law relationship with model sizes. We propose a theory based on (conditional) entropy to elucidate the scaling law phenomenon.
arXiv Detail & Related papers (2024-02-01T12:50:43Z)
Interpreting Pretrained Language Models via Concept Bottlenecks [55.47515772358389]
Pretrained language models (PLMs) have made significant strides in various natural language processing tasks. The lack of interpretability due to their black-box'' nature poses challenges for responsible implementation. We propose a novel approach to interpreting PLMs by employing high-level, meaningful concepts that are easily understandable for humans.
arXiv Detail & Related papers (2023-11-08T20:41:18Z)
Evaluating and Explaining Large Language Models for Code Using Syntactic Structures [74.93762031957883]
This paper introduces ASTxplainer, an explainability method specific to Large Language Models for code. At its core, ASTxplainer provides an automated method for aligning token predictions with AST nodes. We perform an empirical evaluation on 12 popular LLMs for code using a curated dataset of the most popular GitHub projects.
arXiv Detail & Related papers (2023-08-07T18:50:57Z)
Provable concept learning for interpretable predictions using variational inference [7.0349768355860895]
In safety critical applications, practitioners are reluctant to trust neural networks when no interpretable explanations are available. We propose a probabilistic modeling framework to derive (C)oncept (L)earning and (P)rediction (CLAP) We prove that our method is able to identify them while attaining optimal classification accuracy.
arXiv Detail & Related papers (2022-04-01T14:51:38Z)
Leap-Of-Thought: Teaching Pre-Trained Models to Systematically Reason Over Implicit Knowledge [96.92252296244233]
Large pre-trained language models (LMs) acquire some reasoning capacity, but this ability is difficult to control. We show that LMs can be trained to reliably perform systematic reasoning combining both implicit, pre-trained knowledge and explicit natural language statements. Our work paves a path towards open-domain systems that constantly improve by interacting with users who can instantly correct a model by adding simple natural language statements.
arXiv Detail & Related papers (2020-06-11T17:02:20Z)

This list is automatically generated from the titles and abstracts of the papers in this site.