Don't Eliminate Cut: Exponential Separations in LLM-Based Theorem Proving
- URL: http://arxiv.org/abs/2602.10512v1
- Date: Wed, 11 Feb 2026 04:24:09 GMT
- Title: Don't Eliminate Cut: Exponential Separations in LLM-Based Theorem Proving
- Authors: Sho Sonoda, Shunta Akiyama, Yuya Uezato,
- Abstract summary: We develop a theoretical analysis of LLM-guided formal theorem proving in interactive proof assistants (e.g., Lean)<n>To capture modern representation learning, we treat the state and action spaces as general compact metric spaces and assume Lipschitz policies.<n>Our main separation result shows that when cut elimination expands a DAG of depth $D$ into a cut-free tree of size $(D)$ while the cut-aware hierarchical process has size $O(D)$ with $ll$, a flat learner provably requires exponentially more data than a cut-aware hierarchical learner
- Score: 8.948475969696075
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: We develop a theoretical analysis of LLM-guided formal theorem proving in interactive proof assistants (e.g., Lean) by modeling tactic proposal as a stochastic policy in a finite-horizon deterministic MDP. To capture modern representation learning, we treat the state and action spaces as general compact metric spaces and assume Lipschitz policies. To explain the gap between worst-case hardness and empirical success, we introduce problem distributions generated by a reference policy $q$, including a latent-variable model in which proofs exhibit reusable cut/lemma/sketch structure represented by a proof DAG. Under a top-$k$ search protocol and Tsybakov-type margin conditions, we derive lower bounds on finite-horizon success probability that decompose into search and learning terms, with learning controlled by sequential Rademacher/covering complexity. Our main separation result shows that when cut elimination expands a DAG of depth $D$ into a cut-free tree of size $Ω(Λ^D)$ while the cut-aware hierarchical process has size $O(λ^D)$ with $λ\llΛ$, a flat (cut-free) learner provably requires exponentially more data than a cut-aware hierarchical learner. This provides a principled justification for subgoal decomposition in recent agentic theorem provers.
Related papers
- On Multi-Step Theorem Prediction via Non-Parametric Structural Priors [50.16583672681106]
In this work, we explore training-free theorem prediction through the lens of in-context learning (ICL)<n>We propose Theorem Precedence Graphs, which encode temporal dependencies from historical solution traces as directed graphs, and impose explicit topological constraints that effectively prune the search space during inference.<n>Experiments on the FormalGeo7k benchmark show that our method achieves 89.29% accuracy, substantially outperforming ICL baselines and matching state-of-the-art supervised models.
arXiv Detail & Related papers (2026-03-05T06:08:50Z) - Near-Optimal Sample Complexity Bounds for Constrained Average-Reward MDPs [6.237808815887583]
We study the sample complexity of learning an $epsilon$-optimal policy in constrained average-reward MDPs under a generative model.<n>Our results close the theoretical gap in understanding the complexity of constrained average-reward MDPs.
arXiv Detail & Related papers (2025-09-20T09:19:42Z) - Reinforcement Learning in MDPs with Information-Ordered Policies [7.881781003954483]
We propose an epoch-based reinforcement learning algorithm for infinite-horizon average-cost Markov decision processes.<n>We show that our algorithm achieves a regret bound of $O(sqrtw log(|Theta|) T)$, where $w$ is the width of the partial order.<n>We illustrate the applicability of these partial orders in many domains in operations research, including inventory control and queuing systems.
arXiv Detail & Related papers (2025-08-05T20:43:23Z) - Score-based generative models are provably robust: an uncertainty quantification perspective [4.396860522241307]
We show that score-based generative models (SGMs) are provably robust to the multiple sources of error in practical implementation.
Our primary tool is the Wasserstein uncertainty propagation (WUP) theorem.
We show how errors due to (a) finite sample approximation, (b) early stopping, (c) score-matching objective choice, (d) score function parametrization, and (e) reference distribution choice, impact the quality of the generative model.
arXiv Detail & Related papers (2024-05-24T17:50:17Z) - Projection by Convolution: Optimal Sample Complexity for Reinforcement Learning in Continuous-Space MDPs [56.237917407785545]
We consider the problem of learning an $varepsilon$-optimal policy in a general class of continuous-space Markov decision processes (MDPs) having smooth Bellman operators.
Key to our solution is a novel projection technique based on ideas from harmonic analysis.
Our result bridges the gap between two popular but conflicting perspectives on continuous-space MDPs.
arXiv Detail & Related papers (2024-05-10T09:58:47Z) - Demonstration-Regularized RL [39.96273388393764]
Using expert demonstrations, we identify an optimal policy at a sample complexity of order $widetildeO(mathrmPoly(S,A,H)/(varepsilon2 NmathrmE)$ in finite and $widetildeO(mathrmPoly(d,H)/(varepsilon2 NmathrmE)$ in linear Markov decision processes.
We establish that demonstration-regularized methods are provably efficient for reinforcement learning from human feedback.
arXiv Detail & Related papers (2023-10-26T10:54:47Z) - Settling the Sample Complexity of Online Reinforcement Learning [92.02082223856479]
We show how to achieve minimax-optimal regret without incurring any burn-in cost.<n>We extend our theory to unveil the influences of problem-dependent quantities like the optimal value/cost and certain variances.
arXiv Detail & Related papers (2023-07-25T15:42:11Z) - Horizon-Free and Variance-Dependent Reinforcement Learning for Latent
Markov Decision Processes [62.90204655228324]
We study regret minimization for reinforcement learning (RL) in Latent Markov Decision Processes (LMDPs) with context in hindsight.
We design a novel model-based algorithmic framework which can be instantiated with both a model-optimistic and a value-optimistic solver.
arXiv Detail & Related papers (2022-10-20T21:32:01Z) - Settling the Sample Complexity of Model-Based Offline Reinforcement
Learning [50.5790774201146]
offline reinforcement learning (RL) learns using pre-collected data without further exploration.
Prior algorithms or analyses either suffer from suboptimal sample complexities or incur high burn-in cost to reach sample optimality.
We demonstrate that the model-based (or "plug-in") approach achieves minimax-optimal sample complexity without burn-in cost.
arXiv Detail & Related papers (2022-04-11T17:26:19Z) - Learning 1-Dimensional Submanifolds for Subsequent Inference on Random
Dot Product Graphs [7.066109885246681]
manifold learning can be used to learn the unknown submanifold well enough to realize benefit from restricted inference.
We propose test statistics that deploy the Isomap procedure for manifold learning.
We also apply our methods to an inference problem that arises in studying the connectome of the Drosophila larval mushroom body.
arXiv Detail & Related papers (2020-04-15T21:20:10Z) - Kernel-Based Reinforcement Learning: A Finite-Time Analysis [53.47210316424326]
We introduce Kernel-UCBVI, a model-based optimistic algorithm that leverages the smoothness of the MDP and a non-parametric kernel estimator of the rewards.
We empirically validate our approach in continuous MDPs with sparse rewards.
arXiv Detail & Related papers (2020-04-12T12:23:46Z) - Upper Confidence Primal-Dual Reinforcement Learning for CMDP with
Adversarial Loss [145.54544979467872]
We consider online learning for episodically constrained Markov decision processes (CMDPs)
We propose a new emphupper confidence primal-dual algorithm, which only requires the trajectories sampled from the transition model.
Our analysis incorporates a new high-probability drift analysis of Lagrange multiplier processes into the celebrated regret analysis of upper confidence reinforcement learning.
arXiv Detail & Related papers (2020-03-02T05:02:23Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.