Related papers: Beyond the Black Box: Theory and Mechanism of Large Language Models

Beyond the Black Box: Theory and Mechanism of Large Language Models

URL: http://arxiv.org/abs/2601.02907v1
Date: Tue, 06 Jan 2026 10:45:53 GMT
Title: Beyond the Black Box: Theory and Mechanism of Large Language Models
Authors: Zeyu Gan, Ruifeng Ren, Wei Yao, Xiaolin Hu, Gengze Xu, Chen Qian, Huayi Tang, Zixuan Gong, Xinhao Yao, Pengwei Tang, Zhenxing Dou, Yong Liu,
Abstract summary: The rapid emergence of Large Language Models (LLMs) has precipitated a profound paradigm shift in Artificial Intelligence.<n>This survey proposes a unified lifecycle-based taxonomy that organizes the research landscape into six distinct stages: Data Preparation, Model Preparation, Training, Alignment, Inference, and Evaluation.
Score: 39.10631426330405
License: http://creativecommons.org/licenses/by/4.0/
Abstract: The rapid emergence of Large Language Models (LLMs) has precipitated a profound paradigm shift in Artificial Intelligence, delivering monumental engineering successes that increasingly impact modern society. However, a critical paradox persists within the current field: despite the empirical efficacy, our theoretical understanding of LLMs remains disproportionately nascent, forcing these systems to be treated largely as ``black boxes''. To address this theoretical fragmentation, this survey proposes a unified lifecycle-based taxonomy that organizes the research landscape into six distinct stages: Data Preparation, Model Preparation, Training, Alignment, Inference, and Evaluation. Within this framework, we provide a systematic review of the foundational theories and internal mechanisms driving LLM performance. Specifically, we analyze core theoretical issues such as the mathematical justification for data mixtures, the representational limits of various architectures, and the optimization dynamics of alignment algorithms. Moving beyond current best practices, we identify critical frontier challenges, including the theoretical limits of synthetic data self-improvement, the mathematical bounds of safety guarantees, and the mechanistic origins of emergent intelligence. By connecting empirical observations with rigorous scientific inquiry, this work provides a structured roadmap for transitioning LLM development from engineering heuristics toward a principled scientific discipline.

Related papers

Towards a Mechanistic Understanding of Large Reasoning Models: A Survey of Training, Inference, and Failures [72.27391760972445]
Large Reasoning Models (LRMs) have pushed reasoning capabilities to new heights.<n>This paper organizes recent findings into three core dimensions: 1) training dynamics, 2) reasoning mechanisms, and 3) unintended behaviors.
arXiv Detail & Related papers (2026-01-11T08:48:46Z)
How and Why LLMs Generalize: A Fine-Grained Analysis of LLM Reasoning from Cognitive Behaviors to Low-Level Patterns [51.02752099869218]
Large Language Models (LLMs) display strikingly different generalization behaviors.<n>We introduce a novel benchmark that decomposes reasoning into atomic core skills.<n>We show that RL-tuned models maintain more stable behavioral profiles and resist collapse in reasoning skills, whereas SFT models exhibit sharper drift and overfit to surface patterns.
arXiv Detail & Related papers (2025-12-30T08:16:20Z)
On the Theoretical Foundation of Sparse Dictionary Learning in Mechanistic Interpretability [5.009082958329585]
We develop the first unified theoretical framework considering sparse dictionary learning (SDL) as one unified optimization problem.<n>We provide the first theoretical explanations for some empirically observed phenomena, including feature absorption, dead neurons, and the neuron resampling technique.
arXiv Detail & Related papers (2025-12-05T08:47:19Z)
Deep Unfolding: Recent Developments, Theory, and Design Guidelines [99.63555420898554]
This article provides a tutorial-style overview of deep unfolding, a framework that transforms optimization algorithms into structured, trainable ML architectures.<n>We review the foundations of optimization for inference and for learning, introduce four representative design paradigms for deep unfolding, and discuss the distinctive training schemes that arise from their iterative nature.
arXiv Detail & Related papers (2025-12-03T13:16:35Z)
CoT-Space: A Theoretical Framework for Internal Slow-Thinking via Reinforcement Learning [14.337056020596465]
CoT-Space is a novel theoretical framework that recasts Large Language Models (LLMs) reasoning from a discrete token-prediction task to an optimization process within a continuous, reasoning-level semantic space.<n>We show that the convergence to an optimal CoT length is a natural consequence of the fundamental trade-off between underfitting and overfitting.
arXiv Detail & Related papers (2025-09-04T09:02:16Z)
Model Reprogramming Demystified: A Neural Tangent Kernel Perspective [49.42322600160337]
We present a comprehensive theoretical analysis of Model Reprogramming (MR) through the lens of the Neural Tangent Kernel (NTK) framework.<n>We demonstrate that the success of MR is governed by the eigenvalue spectrum of the NTK matrix on the target dataset.<n>Our contributions include a novel theoretical framework for MR, insights into the relationship between source and target models, and extensive experiments validating our findings.
arXiv Detail & Related papers (2025-05-31T16:15:04Z)
Large Language Models as Computable Approximations to Solomonoff Induction [11.811838796672369]
We establish the first formal connection between large language models (LLMs) and Algorithmic Information Theory (AIT)<n>We leverage AIT to provide a unified theoretical explanation for in-context learning, few-shot learning, and scaling laws.<n>Our framework bridges the gap between theoretical foundations and practical LLM behaviors, providing both explanatory power and actionable insights for future model development.
arXiv Detail & Related papers (2025-05-21T17:35:08Z)
Information Science Principles of Machine Learning: A Causal Chain Meta-Framework Based on Formalized Information Mapping [7.299890614172539]
This study addresses key challenges in machine learning, namely the absence of a unified formal theoretical framework and the lack of foundational theories for model interpretability and ethical safety.<n>We first construct a formal information model, explicitly defining the ontological states and carrier mappings of typical machine learning stages.<n>By introducing learnable and processable predicates, as well as learning and processing functions, we analyze the causal chain logic and constraint laws governing machine learning processes.
arXiv Detail & Related papers (2025-05-19T14:39:41Z)
A Survey on Post-training of Large Language Models [185.51013463503946]
Large Language Models (LLMs) have fundamentally transformed natural language processing, making them indispensable across domains ranging from conversational systems to scientific exploration.<n>These challenges necessitate advanced post-training language models (PoLMs) to address shortcomings, such as restricted reasoning capacities, ethical uncertainties, and suboptimal domain-specific performance.<n>This paper presents the first comprehensive survey of PoLMs, systematically tracing their evolution across five core paradigms: Fine-tuning, which enhances task-specific accuracy; Alignment, which ensures ethical coherence and alignment with human preferences; Reasoning, which advances multi-step inference despite challenges in reward design; Integration and Adaptation, which
arXiv Detail & Related papers (2025-03-08T05:41:42Z)

This list is automatically generated from the titles and abstracts of the papers in this site.