Related papers: BFS-Prover: Scalable Best-First Tree Search for LLM-based Automatic Theorem Proving

BFS-Prover: Scalable Best-First Tree Search for LLM-based Automatic Theorem Proving

URL: http://arxiv.org/abs/2502.03438v2
Date: Mon, 24 Feb 2025 10:56:02 GMT
Title: BFS-Prover: Scalable Best-First Tree Search for LLM-based Automatic Theorem Proving
Authors: Ran Xin, Chenguang Xi, Jie Yang, Feng Chen, Hang Wu, Xia Xiao, Yifan Sun, Shen Zheng, Kai Shen,
Abstract summary: We present BFS-Prover, a scalable expert framework, featuring three key innovations.<n>BFS-Prover achieves a state-of-the-art score of $72.95%$ on the MiniF2F test set.
Score: 19.55767322738915
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Recent advancements in large language models (LLMs) have spurred growing interest in automatic theorem proving using Lean4, where effective tree search methods are crucial for navigating the underlying large proof search spaces. While the existing approaches primarily rely on value functions and/or Monte Carlo Tree Search (MCTS), the potential of simpler methods like Best-First Tree Search (BFS) remains underexplored. In this paper, we investigate whether BFS can achieve competitive performance in large-scale theorem proving tasks. We present BFS-Prover, a scalable expert iteration framework, featuring three key innovations. First, we implement strategic data filtering at each expert iteration round, excluding problems solvable via beam search node expansion to focus on harder cases. Second, we improve the sample efficiency of BFS through Direct Preference Optimization (DPO) applied to state-tactic pairs automatically annotated with compiler error feedback, refining the LLM's policy to prioritize productive expansions. Third, we employ length normalization in BFS to encourage exploration of deeper proof paths. BFS-Prover achieves a state-of-the-art score of $72.95\%$ on the MiniF2F test set and therefore challenges the perceived necessity of complex tree search methods, demonstrating that BFS can achieve competitive performance when properly scaled. To facilitate further research and development in this area, we have open-sourced our model at https://huggingface.co/bytedance-research/BFS-Prover.

Related papers

LLM-First Search: Self-Guided Exploration of the Solution Space [29.780554400938335]
Large Language Models (LLMs) have demonstrated remarkable improvements in reasoning and planning through increased test-time compute.<n>We propose textbfLLM-First Search (LFS), a novel textitLLM Self-Guided Search method.
arXiv Detail & Related papers (2025-06-05T16:27:49Z)
LLM-based Automated Theorem Proving Hinges on Scalable Synthetic Data Generation [11.045086599038338]
We introduce a novel proof-state exploration approach for training data synthesis, designed to produce diverse tactics across a wide range of intermediate proof states.<n>We also propose an adaptive beam size strategy, which effectively takes advantage of our data synthesis method and achieves a trade-off between exploration and exploitation during tree search.
arXiv Detail & Related papers (2025-05-17T14:47:36Z)
Dynamic Parallel Tree Search for Efficient LLM Reasoning [102.16694475391665]
Tree of Thoughts (ToT) enhances Large Language Model (LLM) reasoning by structuring problem-solving as a spanning tree. We propose Dynamic Parallel Tree Search (DPTS), a novel parallelism framework that aims to dynamically optimize the reasoning path in inference. Experiments on Qwen-2.5 and Llama-3 with Math500 and GSM8K datasets show that DPTS significantly improves efficiency by 2-4x on average.
arXiv Detail & Related papers (2025-02-22T14:13:37Z)
Efficient Neural Theorem Proving via Fine-grained Proof Structure Analysis [50.020850767257095]
We propose ProofAug, a novel theorem proving method that enjoys superior sample efficiency.<n>Our method is validated on the miniF2F-test benchmark using the open-source deepseek-math-7b-base model and the Isabelle proof assistant.
arXiv Detail & Related papers (2025-01-30T12:37:06Z)
Enhancing LLM Reasoning with Reward-guided Tree Search [95.06503095273395]
o1-like reasoning approach is challenging, and researchers have been making various attempts to advance this open area of research. We present a preliminary exploration into enhancing the reasoning abilities of LLMs through reward-guided tree search algorithms.
arXiv Detail & Related papers (2024-11-18T16:15:17Z)
LLaMA-Berry: Pairwise Optimization for O1-like Olympiad-Level Mathematical Reasoning [56.273799410256075]
The framework combines Monte Carlo Tree Search (MCTS) with iterative Self-Refine to optimize the reasoning path. The framework has been tested on general and advanced benchmarks, showing superior performance in terms of search efficiency and problem-solving capability.
arXiv Detail & Related papers (2024-10-03T18:12:29Z)
IDEAL: Leveraging Infinite and Dynamic Characterizations of Large Language Models for Query-focused Summarization [59.06663981902496]
Query-focused summarization (QFS) aims to produce summaries that answer particular questions of interest, enabling greater user control and personalization.<n>We investigate two indispensable characteristics that the LLMs-based QFS models should be harnessed, Lengthy Document Summarization and Efficiently Fine-grained Query-LLM Alignment.<n>These innovations pave the way for broader application and accessibility in the field of QFS technology.
arXiv Detail & Related papers (2024-07-15T07:14:56Z)
Uncertainty-Guided Optimization on Large Language Model Search Trees [42.71167208999792]
Tree search algorithms such as greedy and beam search are the standard when it comes to finding sequences of maximum likelihood in the decoding processes of large language models (LLMs) We define prior beliefs over LLMs' transition probabilities and obtain posterior beliefs over the most promising paths in each iteration. Unlike expensive simulation-based non-myopic methods like the Monte Carlo tree search, our method only requires samples from the beliefs.
arXiv Detail & Related papers (2024-07-04T14:08:50Z)
AQA-Bench: An Interactive Benchmark for Evaluating LLMs' Sequential Reasoning Ability [29.1826948551409]
AQA-Bench is a novel benchmark to assess the sequential reasoning capabilities of large language models. We build AQA-Bench with three different algorithms, namely binary search, depth-first search, and breadth-first search. Our investigations reveal several interesting findings.
arXiv Detail & Related papers (2024-02-14T18:59:33Z)
Autonomous Tree-search Ability of Large Language Models [58.68735916408101]
Large Language Models have excelled in remarkable reasoning capabilities with advanced prompting techniques. Recent works propose to utilize external programs to define search logic, such that LLMs can perform passive tree search to solve more challenging reasoning tasks. We propose a new concept called autonomous tree-search ability of LLM, which can automatically generate a response containing search trajectories for the correct answer.
arXiv Detail & Related papers (2023-10-14T14:14:38Z)
UNSAT Solver Synthesis via Monte Carlo Forest Search [10.754275929551593]
We introduce Monte Carlo Forest Search (MCFS), a class of reinforcement learning (RL) algorithms for learning policies in tree MDPs Examples of such problems include proving unsatisfiability of a SAT formula; counting the number of solutions of a satisfiable SAT formula. We dub Knuth Synthesis, an MCFS algorithm that learns DPLL branching policies for solving the satisfiability (SAT) problem.
arXiv Detail & Related papers (2022-11-22T20:52:50Z)
Efficient Non-Parametric Optimizer Search for Diverse Tasks [93.64739408827604]
We present the first efficient scalable and general framework that can directly search on the tasks of interest. Inspired by the innate tree structure of the underlying math expressions, we re-arrange the spaces into a super-tree. We adopt an adaptation of the Monte Carlo method to tree search, equipped with rejection sampling and equivalent- form detection.
arXiv Detail & Related papers (2022-09-27T17:51:31Z)
Reinforcement Learning for Branch-and-Bound Optimisation using Retrospective Trajectories [72.15369769265398]
Machine learning has emerged as a promising paradigm for branching. We propose retro branching; a simple yet effective approach to RL for branching. We outperform the current state-of-the-art RL branching algorithm by 3-5x and come within 20% of the best IL method's performance on MILPs with 500 constraints and 1000 variables.
arXiv Detail & Related papers (2022-05-28T06:08:07Z)

This list is automatically generated from the titles and abstracts of the papers in this site.