Related papers: ProofCompass: Enhancing Specialized Provers with LLM Guidance

ProofCompass: Enhancing Specialized Provers with LLM Guidance

URL: http://arxiv.org/abs/2507.14335v1
Date: Fri, 18 Jul 2025 19:28:01 GMT
Title: ProofCompass: Enhancing Specialized Provers with LLM Guidance
Authors: Nicolas Wischermann, Claudio Mayrink Verdun, Gabriel Poesia, Francesco Noseda,
Abstract summary: This paper introduces Proof, a novel hybrid methodology that achieves remarkable computational efficiency.<n>It strategically guides existing specialized prover methods, such as DeepSeek-Prover-v1.5-RL (DSP-v1.5) with a Large Language Model (LLM) without requiring additional model training.<n>On the miniF2F benchmark, Proof demonstrates substantial resource efficiency: it outperforms DSP-v1.5 ($54.9% rightarrow 55.3%$) while using 25 fewer attempts ($3200 rightarrow 128$)
Score: 6.757964026033364
License: http://creativecommons.org/licenses/by-sa/4.0/
Abstract: Language models have become increasingly powerful tools for formal mathematical reasoning. However, most existing approaches rely exclusively on either large general-purpose models or smaller specialized models, each with distinct limitations, while training specialized large models still requires significant computational resources. This paper introduces ProofCompass, a novel hybrid methodology that achieves remarkable computational efficiency by strategically guiding existing specialized prover methods, such as DeepSeek-Prover-v1.5-RL (DSP-v1.5) with a Large Language Model (LLM) without requiring additional model training. The LLM provides natural language proof strategies and analyzes failed attempts to select intermediate lemmas, enabling effective problem decomposition. On the miniF2F benchmark, ProofCompass demonstrates substantial resource efficiency: it outperforms DSP-v1.5 ($54.9\% \rightarrow 55.3\%$) while using 25x fewer attempts ($3200 \rightarrow 128$). Our synergistic approach paves the way for simultaneously improving computational efficiency and accuracy in formal theorem proving.

Related papers

SPaRFT: Self-Paced Reinforcement Fine-Tuning for Large Language Models [51.74498855100541]
Large language models (LLMs) have shown strong reasoning capabilities when fine-tuned with reinforcement learning (RL)<n>We propose textbfSPaRFT, a self-paced learning framework that enables efficient learning based on the capability of the model being trained.
arXiv Detail & Related papers (2025-08-07T03:50:48Z)
Solving Formal Math Problems by Decomposition and Iterative Reflection [30.54275542622631]
textbfDelta Prover orchestrates the interaction between a general-purpose LLM and the Lean 4 proof environment.<n>bftextDelta Prover achieves a state-of-the-art 95.9% success rate on the miniF2F-test benchmark.
arXiv Detail & Related papers (2025-07-21T03:56:35Z)
LLaDA 1.5: Variance-Reduced Preference Optimization for Large Language Diffusion Models [76.8317443926908]
Masked Diffusion Models (MDMs) present a promising paradigm for language modeling.<n>The challenge arises from the high variance in Evidence Lower Bound (ELBO)-based likelihood estimates required for preference optimization.<n>We propose Variance-Reduced Preference Optimization (VRPO), a framework that formally analyzes the variance of ELBO estimators and derives on both the bias and variance of preference optimization gradients.
arXiv Detail & Related papers (2025-05-25T16:36:20Z)
Can 1B LLM Surpass 405B LLM? Rethinking Compute-Optimal Test-Time Scaling [69.57918638435491]
Test-Time Scaling is an important method for improving the performance of Large Language Models.<n>This paper focuses on two core questions: What is the optimal approach to scale test-time computation across different policy models, PRMs, and problem difficulty levels?<n>We show that with our compute-optimal TTS strategy, extremely small policy models can outperform larger models.
arXiv Detail & Related papers (2025-02-10T17:30:23Z)
Do NOT Think That Much for 2+3=? On the Overthinking of o1-Like LLMs [76.43407125275202]
o1-like models can emulate human-like long-time thinking during inference.<n>This paper presents the first comprehensive study on the prevalent issue of overthinking in these models.<n>We propose strategies to mitigate overthinking, streamlining reasoning processes without compromising accuracy.
arXiv Detail & Related papers (2024-12-30T18:55:12Z)
A Convex-optimization-based Layer-wise Post-training Pruner for Large Language Models [24.185245582500876]
We introduce FISTAPruner, the first post-training pruner based on convex optimization models and algorithms. FISTAPruner incorporates an intra-layer cumulative error correction mechanism and supports parallel pruning. We evaluate FISTAPruner on models such as OPT, LLaMA, LLaMA-2, and LLaMA-3 with 125M to 70B parameters under unstructured and 2:4 semi-structured sparsity.
arXiv Detail & Related papers (2024-08-07T12:33:46Z)
Inference Scaling Laws: An Empirical Analysis of Compute-Optimal Inference for Problem-Solving with Language Models [46.959380978972206]
We study inference scaling laws (aka test-time scaling laws) and compute-optimal inference.<n>As a first step towards understanding and designing compute-optimal inference methods, we studied cost-performance trade-offs for inference strategies.<n>Our findings suggest that scaling inference compute with inference strategies can be more computationally efficient than scaling model parameters.
arXiv Detail & Related papers (2024-08-01T17:16:04Z)
Pruning Large Language Models with Semi-Structural Adaptive Sparse Training [17.381160429641316]
Adaptive Sparse Trainer (AST) is a novel and efficient retraining framework tailored for semi-structured sparse models.<n>AST reduces the perplexity and zero-shot accuracy gap between dense and 2:4 semi-structured sparse models to 0.6 and 1.16%, respectively.
arXiv Detail & Related papers (2024-07-30T06:33:44Z)
Towards Efficient Pareto Set Approximation via Mixture of Experts Based Model Fusion [53.33473557562837]
Solving multi-objective optimization problems for large deep neural networks is a challenging task due to the complexity of the loss landscape and the expensive computational cost. We propose a practical and scalable approach to solve this problem via mixture of experts (MoE) based model fusion. By ensembling the weights of specialized single-task models, the MoE module can effectively capture the trade-offs between multiple objectives.
arXiv Detail & Related papers (2024-06-14T07:16:18Z)
When Parameter-efficient Tuning Meets General-purpose Vision-language Models [65.19127815275307]
PETAL revolutionizes the training process by requiring only 0.5% of the total parameters, achieved through a unique mode approximation technique. Our experiments reveal that PETAL not only outperforms current state-of-the-art methods in most scenarios but also surpasses full fine-tuning models in effectiveness.
arXiv Detail & Related papers (2023-12-16T17:13:08Z)

This list is automatically generated from the titles and abstracts of the papers in this site.