Related papers: Large Language Models as Computable Approximations to Solomonoff Induction

Large Language Models as Computable Approximations to Solomonoff Induction

URL: http://arxiv.org/abs/2505.15784v1
Date: Wed, 21 May 2025 17:35:08 GMT
Title: Large Language Models as Computable Approximations to Solomonoff Induction
Authors: Jun Wan, Lingrui Mei,
Abstract summary: We establish the first formal connection between large language models (LLMs) and Algorithmic Information Theory (AIT)<n>We leverage AIT to provide a unified theoretical explanation for in-context learning, few-shot learning, and scaling laws.<n>Our framework bridges the gap between theoretical foundations and practical LLM behaviors, providing both explanatory power and actionable insights for future model development.
Score: 11.811838796672369
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: The rapid advancement of large language models (LLMs) calls for a rigorous theoretical framework to explain their empirical success. While significant progress has been made in understanding LLM behaviors, existing theoretical frameworks remain fragmented in explaining emergent phenomena through a unified mathematical lens. We establish the first formal connection between LLM architectures and Algorithmic Information Theory (AIT) by proving two fundamental results: (1) the training process computationally approximates Solomonoff prior through loss minimization interpreted as program length optimization, and (2) next-token prediction implements approximate Solomonoff induction. We leverage AIT to provide a unified theoretical explanation for in-context learning, few-shot learning, and scaling laws. Furthermore, our theoretical insights lead to a principled method for few-shot example selection that prioritizes samples where models exhibit lower predictive confidence. We demonstrate through experiments on diverse text classification benchmarks that this strategy yields significant performance improvements, particularly for smaller model architectures, when compared to selecting high-confidence examples. Our framework bridges the gap between theoretical foundations and practical LLM behaviors, providing both explanatory power and actionable insights for future model development.

Related papers

On the Theoretical Foundation of Sparse Dictionary Learning in Mechanistic Interpretability [5.009082958329585]
We develop the first unified theoretical framework considering sparse dictionary learning (SDL) as one unified optimization problem.<n>We provide the first theoretical explanations for some empirically observed phenomena, including feature absorption, dead neurons, and the neuron resampling technique.
arXiv Detail & Related papers (2025-12-05T08:47:19Z)
Deep Unfolding: Recent Developments, Theory, and Design Guidelines [99.63555420898554]
This article provides a tutorial-style overview of deep unfolding, a framework that transforms optimization algorithms into structured, trainable ML architectures.<n>We review the foundations of optimization for inference and for learning, introduce four representative design paradigms for deep unfolding, and discuss the distinctive training schemes that arise from their iterative nature.
arXiv Detail & Related papers (2025-12-03T13:16:35Z)
Model Steering: Learning with a Reference Model Improves Generalization Bounds and Scaling Laws [52.10468229008941]
This paper formalizes an emerging learning paradigm that uses a trained model as a reference to guide and enhance the training of a target model through strategic data selection or weighting.<n>We provide theoretical insights into why this approach improves generalization and data efficiency compared to training without a reference model.<n>Building on these insights, we introduce a novel method for Contrastive Language-Image Pretraining with a reference model, termed DRRho-CLIP.
arXiv Detail & Related papers (2025-05-10T16:55:03Z)
Investigating the Zone of Proximal Development of Language Models for In-Context Learning [59.91708683601029]
We introduce a learning analytics framework to analyze the in-context learning (ICL) behavior of large language models (LLMs)<n>We adapt the Zone of Proximal Development (ZPD) theory to ICL, measuring the ZPD of LLMs based on model performance on individual examples.<n>Our findings reveal a series of intricate and multifaceted behaviors of ICL, providing new insights into understanding and leveraging this technique.
arXiv Detail & Related papers (2025-02-10T19:36:21Z)
Large Language Models as Markov Chains [7.078696932669912]
We draw an equivalence between autoregressive transformer-based language models and Markov chains defined on a finite state space.<n>We relate the obtained results to the pathological behavior observed with LLMs.<n> Experiments with the most recent Llama and Gemma herds of models show that our theory correctly captures their behavior in practice.
arXiv Detail & Related papers (2024-10-03T17:45:31Z)
Beyond the Black Box: A Statistical Model for LLM Reasoning and Inference [0.9898607871253774]
This paper introduces a novel Bayesian learning model to explain the behavior of Large Language Models (LLMs) We develop a theoretical framework based on an ideal generative text model represented by a multinomial transition probability matrix with a prior, and examine how LLMs approximate this matrix.
arXiv Detail & Related papers (2024-02-05T16:42:10Z)
Explanation-aware Soft Ensemble Empowers Large Language Model In-context Learning [50.00090601424348]
Large language models (LLMs) have shown remarkable capabilities in various natural language understanding tasks. We propose EASE, an Explanation-Aware Soft Ensemble framework to empower in-context learning with LLMs.
arXiv Detail & Related papers (2023-11-13T06:13:38Z)
Faithful Explanations of Black-box NLP Models Using LLM-generated Counterfactuals [67.64770842323966]
Causal explanations of predictions of NLP systems are essential to ensure safety and establish trust. Existing methods often fall short of explaining model predictions effectively or efficiently. We propose two approaches for counterfactual (CF) approximation.
arXiv Detail & Related papers (2023-10-01T07:31:04Z)
Hierarchical Optimization-Derived Learning [58.69200830655009]
We establish a new framework, named Hierarchical ODL (HODL), to simultaneously investigate the intrinsic behaviors of optimization-derived model construction and its corresponding learning process. This is the first theoretical guarantee for these two coupled ODL components: optimization and learning.
arXiv Detail & Related papers (2023-02-11T03:35:13Z)
Synergies between Disentanglement and Sparsity: Generalization and Identifiability in Multi-Task Learning [79.83792914684985]
We prove a new identifiability result that provides conditions under which maximally sparse base-predictors yield disentangled representations. Motivated by this theoretical result, we propose a practical approach to learn disentangled representations based on a sparsity-promoting bi-level optimization problem.
arXiv Detail & Related papers (2022-11-26T21:02:09Z)
Improving Few-Shot Learning through Multi-task Representation Learning Theory [14.8429503385929]
We consider the framework of multi-task representation (MTR) learning where the goal is to use source tasks to learn a representation that reduces the sample complexity of solving a target task. We show that recent advances in MTR theory can provide novel insights for popular meta-learning algorithms when analyzed within this framework. This is the first contribution that puts the most recent learning bounds of MTR theory into practice for the task of few-shot classification.
arXiv Detail & Related papers (2020-10-05T13:24:43Z)

This list is automatically generated from the titles and abstracts of the papers in this site.