Kimina-Prover Preview: Towards Large Formal Reasoning Models with Reinforcement Learning
- URL: http://arxiv.org/abs/2504.11354v1
- Date: Tue, 15 Apr 2025 16:23:44 GMT
- Title: Kimina-Prover Preview: Towards Large Formal Reasoning Models with Reinforcement Learning
- Authors: Haiming Wang, Mert Unsal, Xiaohan Lin, Mantas Baksys, Junqi Liu, Marco Dos Santos, Flood Sung, Marina Vinyes, Zhenzhe Ying, Zekai Zhu, Jianqiao Lu, Hugues de Saxcé, Bolton Bailey, Chendong Song, Chenjun Xiao, Dehao Zhang, Ebony Zhang, Frederick Pu, Han Zhu, Jiawei Liu, Jonas Bayer, Julien Michel, Longhui Yu, Léo Dreyfus-Schmidt, Lewis Tunstall, Luigi Pagani, Moreira Machado, Pauline Bourigault, Ran Wang, Stanislas Polu, Thibaut Barroyer, Wen-Ding Li, Yazhe Niu, Yann Fleureau, Yangyang Hu, Zhouliang Yu, Zihan Wang, Zhilin Yang, Zhengying Liu, Jia Li,
- Abstract summary: Kimina-Prover is a large language model that pioneers a novel reasoning-driven exploration paradigm for formal theorem proving.<n>Trained with a large-scale reinforcement learning pipeline from Qwen2.5-72B, Kimina-Prover demonstrates strong performance in Lean 4 proof generation.<n>Kimina-Prover sets a new state-of-the-art on the miniF2F benchmark, reaching 80.7% with pass@8192.
- Score: 38.4246156415702
- License: http://creativecommons.org/licenses/by-nc-nd/4.0/
- Abstract: We introduce Kimina-Prover Preview, a large language model that pioneers a novel reasoning-driven exploration paradigm for formal theorem proving, as showcased in this preview release. Trained with a large-scale reinforcement learning pipeline from Qwen2.5-72B, Kimina-Prover demonstrates strong performance in Lean 4 proof generation by employing a structured reasoning pattern we term \textit{formal reasoning pattern}. This approach allows the model to emulate human problem-solving strategies in Lean, iteratively generating and refining proof steps. Kimina-Prover sets a new state-of-the-art on the miniF2F benchmark, reaching 80.7% with pass@8192. Beyond improved benchmark performance, our work yields several key insights: (1) Kimina-Prover exhibits high sample efficiency, delivering strong results even with minimal sampling (pass@1) and scaling effectively with computational budget, stemming from its unique reasoning pattern and RL training; (2) we demonstrate clear performance scaling with model size, a trend previously unobserved for neural theorem provers in formal mathematics; (3) the learned reasoning style, distinct from traditional search algorithms, shows potential to bridge the gap between formal verification and informal mathematical intuition. We open source distilled versions with 1.5B and 7B parameters of Kimina-Prover
Related papers
- Metastable Dynamics of Chain-of-Thought Reasoning: Provable Benefits of Search, RL and Distillation [40.861314212279474]
We study inference-time compute by viewing chain-of-thought (CoT) generation as a metastable Markov process.
We prove that implementing a search protocol that rewards sparse edges improves CoT by decreasing the expected number of steps to reach different clusters.
We also show that the information gained by search can be utilized to obtain a better reasoning model.
arXiv Detail & Related papers (2025-02-02T18:19:14Z) - BRiTE: Bootstrapping Reinforced Thinking Process to Enhance Language Model Reasoning [78.63421517563056]
Large Language Models (LLMs) have demonstrated remarkable capabilities in complex reasoning tasks.
We present a unified probabilistic framework that formalizes LLM reasoning through a novel graphical model.
We introduce the Bootstrapping Reinforced Thinking Process (BRiTE) algorithm, which works in two steps.
arXiv Detail & Related papers (2025-01-31T02:39:07Z) - Efficient Neural Theorem Proving via Fine-grained Proof Structure Analysis [50.020850767257095]
We propose ProofAug, a novel theorem proving method that enjoys superior sample efficiency.
Our method is validated on the miniF2F-test benchmark using the open-source deepseek-math-7b-base model and the Isabelle proof assistant.
arXiv Detail & Related papers (2025-01-30T12:37:06Z) - Inference-Aware Fine-Tuning for Best-of-N Sampling in Large Language Models [80.65242356955231]
We propose a novel inference-aware fine-tuning paradigm, in which the model is fine-tuned in a manner that directly optimize the performance of the inference-time strategy.
We devise the first imitation learning and reinforcement learning(RL) methods for BoN-aware fine-tuning, overcoming the challenging, non-differentiable argmax operator within BoN.
Our experiments demonstrate the effectiveness of BoN-aware fine-tuning in terms of improved performance and inference-time compute.
arXiv Detail & Related papers (2024-12-18T20:43:47Z) - Proof Flow: Preliminary Study on Generative Flow Network Language Model Tuning for Formal Reasoning [11.268313729426627]
We present a proof of concept in the domain of formal reasoning, specifically in the Neural Theorem Proving setting.
Unlike classical reward-maximization reinforcement learning, GFlowNets have emerged as a promising approach for sampling compositional objects.
Our early results demonstrate GFlowNet fine-tuning's potential for enhancing model performance in a search setting.
arXiv Detail & Related papers (2024-10-17T05:10:12Z) - Lean-STaR: Learning to Interleave Thinking and Proving [53.923617816215774]
We present Lean-STaR, a framework for training language models to produce informal thoughts prior to each step of a proof.<n>Lean-STaR achieves state-of-the-art results on the miniF2F-test benchmark within the Lean theorem proving environment.
arXiv Detail & Related papers (2024-07-14T01:43:07Z) - Faithful Explanations of Black-box NLP Models Using LLM-generated
Counterfactuals [67.64770842323966]
Causal explanations of predictions of NLP systems are essential to ensure safety and establish trust.
Existing methods often fall short of explaining model predictions effectively or efficiently.
We propose two approaches for counterfactual (CF) approximation.
arXiv Detail & Related papers (2023-10-01T07:31:04Z) - HyperTree Proof Search for Neural Theorem Proving [14.677400513932852]
We propose an online training procedure for a transformer-based automated theorem prover.
Our model learns from previous proof searches through online training, allowing it to generalize to domains far from the training distribution.
We show that with HTPS alone, a model trained on annotated proofs manages to prove 65.4% of a held-out set of Metamath theorems, significantly outperforming the previous state of the art of 56.5% by GPT-f.
arXiv Detail & Related papers (2022-05-23T17:49:55Z) - Provable Benefits of Overparameterization in Model Compression: From
Double Descent to Pruning Neural Networks [38.153825455980645]
Recent empirical evidence indicates that the practice of overization not only benefits training large models, but also assists - perhaps counterintuitively - building lightweight models.
This paper sheds light on these empirical findings by theoretically characterizing the high-dimensional toolsets of model pruning.
We analytically identify regimes in which, even if the location of the most informative features is known, we are better off fitting a large model and then pruning.
arXiv Detail & Related papers (2020-12-16T05:13:30Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.