LLM-based Automated Theorem Proving Hinges on Scalable Synthetic Data Generation
- URL: http://arxiv.org/abs/2505.12031v1
- Date: Sat, 17 May 2025 14:47:36 GMT
- Title: LLM-based Automated Theorem Proving Hinges on Scalable Synthetic Data Generation
- Authors: Junyu Lai, Jiakun Zhang, Shuo Xu, Taolue Chen, Zihang Wang, Yao Yang, Jiarui Zhang, Chun Cao, Jingwei Xu,
- Abstract summary: We introduce a novel proof-state exploration approach for training data synthesis, designed to produce diverse tactics across a wide range of intermediate proof states.<n>We also propose an adaptive beam size strategy, which effectively takes advantage of our data synthesis method and achieves a trade-off between exploration and exploitation during tree search.
- Score: 11.045086599038338
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Recent advancements in large language models (LLMs) have sparked considerable interest in automated theorem proving and a prominent line of research integrates stepwise LLM-based provers into tree search. In this paper, we introduce a novel proof-state exploration approach for training data synthesis, designed to produce diverse tactics across a wide range of intermediate proof states, thereby facilitating effective one-shot fine-tuning of LLM as the policy model. We also propose an adaptive beam size strategy, which effectively takes advantage of our data synthesis method and achieves a trade-off between exploration and exploitation during tree search. Evaluations on the MiniF2F and ProofNet benchmarks demonstrate that our method outperforms strong baselines under the stringent Pass@1 metric, attaining an average pass rate of $60.74\%$ on MiniF2F and $21.18\%$ on ProofNet. These results underscore the impact of large-scale synthetic data in advancing automated theorem proving.
Related papers
- HybridProver: Augmenting Theorem Proving with LLM-Driven Proof Synthesis and Refinement [7.702809989052384]
HybridProver is a dual-model proof framework that combines tactic-based generation and whole-proof synthesis.<n>We implement HybridProver for the Isabelle theorem prover and fine-tune LLMs on our optimized datasets.
arXiv Detail & Related papers (2025-05-21T16:45:43Z) - Efficient Neural Theorem Proving via Fine-grained Proof Structure Analysis [50.020850767257095]
We propose ProofAug, a novel theorem proving method that enjoys superior sample efficiency.<n>Our method is validated on the miniF2F-test benchmark using the open-source deepseek-math-7b-base model and the Isabelle proof assistant.
arXiv Detail & Related papers (2025-01-30T12:37:06Z) - Mitigating Forgetting in LLM Fine-Tuning via Low-Perplexity Token Learning [61.99353167168545]
We show that fine-tuning with LLM-generated data improves target task performance and reduces non-target task degradation.<n>This is the first work to provide an empirical explanation based on token perplexity reduction to mitigate catastrophic forgetting in LLMs after fine-tuning.
arXiv Detail & Related papers (2025-01-24T08:18:56Z) - Understanding Chain-of-Thought in LLMs through Information Theory [16.78730663293352]
We formalize Chain-of-Thought (CoT) reasoning in Large Language Models (LLMs) through an information-theoretic lens.<n>Specifically, our framework quantifies the information-gain' at each reasoning step, enabling the identification of failure modes.<n>We demonstrate the efficacy of our approach through extensive experiments on toy arithmetic, GSM8K and PRM800k datasets.
arXiv Detail & Related papers (2024-11-18T19:14:36Z) - LLM-based MOFs Synthesis Condition Extraction using Few-Shot Demonstrations [31.35595673239483]
Large language models (LLMs) provide disruptively new solution to this long-standing problem.<n>We introduce in this work the few-shot LLM in-context learning paradigm.<n>The synthesis extraction, structure inference, and material design performance of the proposed few-shot LLMs all significantly outplay zero-shot LLM and baseline methods.
arXiv Detail & Related papers (2024-08-06T14:53:25Z) - ITERTL: An Iterative Framework for Fine-tuning LLMs for RTL Code Generation [9.409062607311528]
Large language models (LLMs) have demonstrated excellent performance, inspiring researchers to explore their use in automating register transfer level (RTL) code generation.<n>Existing approaches to fine-tune LLMs for RTL generation typically are conducted on fixed datasets.<n>We introduce an iterative training paradigm named ITERTL to mitigate these issues.<n>Our model outperforms GPT4 and state-of-the-art (SOTA) open-source models, achieving remarkable 53.8% pass@1 rate on VerilogEval-human benchmark.
arXiv Detail & Related papers (2024-06-28T01:44:57Z) - Bypass Back-propagation: Optimization-based Structural Pruning for Large Language Models via Policy Gradient [57.9629676017527]
We propose an optimization-based structural pruning that learns the pruning masks in a probabilistic space directly by optimizing the loss of the pruned model.<n>We achieve this by learning an underlying Bernoulli distribution to sample binary pruning masks.<n>Experiments conducted on LLaMA, LLaMA-2, LLaMA-3, Vicuna, and Mistral models demonstrate the promising performance of our method in efficiency and effectiveness.
arXiv Detail & Related papers (2024-06-15T09:31:03Z) - Monte Carlo Tree Search Boosts Reasoning via Iterative Preference Learning [55.96599486604344]
We introduce an approach aimed at enhancing the reasoning capabilities of Large Language Models (LLMs) through an iterative preference learning process.
We use Monte Carlo Tree Search (MCTS) to iteratively collect preference data, utilizing its look-ahead ability to break down instance-level rewards into more granular step-level signals.
The proposed algorithm employs Direct Preference Optimization (DPO) to update the LLM policy using this newly generated step-level preference data.
arXiv Detail & Related papers (2024-05-01T11:10:24Z) - Self-Play Fine-Tuning Converts Weak Language Models to Strong Language Models [52.98743860365194]
We propose a new fine-tuning method called Self-Play fIne-tuNing (SPIN)
At the heart of SPIN lies a self-play mechanism, where the LLM refines its capability by playing against instances of itself.
This sheds light on the promise of self-play, enabling the achievement of human-level performance in LLMs without the need for expert opponents.
arXiv Detail & Related papers (2024-01-02T18:53:13Z) - Enhancing Neural Theorem Proving through Data Augmentation and Dynamic
Sampling Method [1.8130068086063336]
We introduce DS-Prover, a novel dynamic sampling method for theorem proving.
We augment the training dataset by decomposing simplification and rewrite tactics with multiple premises into tactics with single premises.
We achieve a state-of-the-art performance (Pass@1) of 14.2% on the ProofNet dataset and a performance of 29.8% on MiniF2F.
arXiv Detail & Related papers (2023-12-20T09:55:21Z) - Revisiting LSTM Networks for Semi-Supervised Text Classification via
Mixed Objective Function [106.69643619725652]
We develop a training strategy that allows even a simple BiLSTM model, when trained with cross-entropy loss, to achieve competitive results.
We report state-of-the-art results for text classification task on several benchmark datasets.
arXiv Detail & Related papers (2020-09-08T21:55:22Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.