Related papers: Gödel's Poetry

Gödel's Poetry

URL: http://arxiv.org/abs/2512.14252v1
Date: Tue, 16 Dec 2025 10:00:01 GMT
Title: Gödel's Poetry
Authors: Kelly J. Davis,
Abstract summary: We introduce a new approach to computer theorem proving, one that employs specialized language models for Lean4 proof generation.<n>We achieve a 90.4% pass rate on miniF2F without decomposition.<n>A key technical contribution lies in our extension of the Kimina Lean Server with abstract syntax tree (AST) parsing capabilities.
Score: 0.0
License: http://creativecommons.org/licenses/by/4.0/
Abstract: Formal, automated theorem proving has long been viewed as a challenge to artificial intelligence. We introduce here a new approach to computer theorem proving, one that employs specialized language models for Lean4 proof generation combined with recursive decomposition of difficult theorems into simpler entailing propositions. These models are coordinated through a multi-agent architecture that orchestrates autoformalization (if required), proof generation, decomposition of difficult theorems into simpler entailing propositions, and recursive proof (and/or decomposition) of these propositions. Without decomposition, we achieve a 90.4% pass rate on miniF2F. With decomposition, this is significantly improved. A key technical contribution lies in our extension of the Kimina Lean Server with abstract syntax tree (AST) parsing capabilities to facilitate automated, recursive proof decomposition. The system is made available on PyPI as goedels-poetry (at https://pypi.org/project/goedels-poetry ), and the open-source implementation KellyJDavis/goedels-poetry (at https://github.com/KellyJDavis/goedels-poetry ) facilitates both adaptation to alternative language models and extension with custom functionality.

Related papers

Unifying Tree Search Algorithm and Reward Design for LLM Reasoning: A Survey [92.71325249013535]
Deliberative tree search is a cornerstone of Large Language Model (LLM) research.<n>This paper introduces a unified framework that deconstructs search algorithms into three core components.
arXiv Detail & Related papers (2025-10-11T03:29:18Z)
Neuro-Symbolic Query Compiler [57.78201019000895]
This paper presents QCompiler, a neuro-symbolic framework inspired by linguistic grammar rules and compiler design, to bridge this gap.<n>It theoretically designs a minimal yet sufficient Backus-Naur Form (BNF) grammar $G[q]$ to formalize complex queries.<n>The atomicity of the sub-queries in the leaf ensures more precise document retrieval and response generation, significantly improving the RAG system's ability to address complex queries.
arXiv Detail & Related papers (2025-05-17T09:36:03Z)
Hierarchical Attention Generates Better Proofs [8.676187819105298]
We introduce textbfHierarchical Attention, a regularization method that aligns attention mechanisms with mathematical reasoning structures.<n>Our approach establishes a five-level hierarchy from foundational elements to high-level concepts, ensuring structured information flow in proof generation.
arXiv Detail & Related papers (2025-04-27T10:35:05Z)
Generating Millions Of Lean Theorems With Proofs By Exploring State Transition Graphs [6.65877320351217]
We developed LeanNavigator, a novel method for generating a large-scale dataset of Lean theorems and proofs.<n>We generated 4.7 million theorems totaling 1 billion tokens, surpassing previous datasets by more than an order of magnitude.<n>Using this extensive dataset, we trained an AI model that outperforms the state-of-the-art ReProver model in theorem-proving tasks.
arXiv Detail & Related papers (2025-02-16T06:20:39Z)
ProofAug: Efficient Neural Theorem Proving via Fine-grained Proof Structure Analysis [50.020850767257095]
We propose ProofAug, a procedure that equips LLMs with automation methods at various granularities.<n>Our method is validated on the miniF2F benchmark using the open-source deep-math-7b-base model and the Isabelle proof assistant.<n>We also implement a Lean 4 version of ProofAug that can improve the pass@1 performance of Kimina-Prover-seek-Distill-1.5B from 44.3% to 50.4%.
arXiv Detail & Related papers (2025-01-30T12:37:06Z)
Formal Theorem Proving by Rewarding LLMs to Decompose Proofs Hierarchically [29.908878832382523]
This paper focuses on improving LLMs' ability to write proofs in formal languages that permit automated proof verification/evaluation. We work in a more natural setup where the lemmas that are directly relevant to the theorem are not given to the theorem prover at test time. We design an RL-based training algorithm that encourages the model to decompose a theorem into lemmas, prove the lemmas, and then prove the theorem by using the lemmas.
arXiv Detail & Related papers (2024-11-04T05:57:40Z)
Proving Theorems Recursively [80.42431358105482]
We propose POETRY, which proves theorems in a level-by-level manner. Unlike previous step-by-step methods, POETRY searches for a sketch of the proof at each level. We observe a substantial increase in the maximum proof length found by POETRY, from 10 to 26.
arXiv Detail & Related papers (2024-05-23T10:35:08Z)
PE: A Poincare Explanation Method for Fast Text Hierarchy Generation [16.059322030238384]
We introduce a novel method, namely Poincare Explanation (PE), for modeling feature interactions with hyperbolic spaces. Specifically, we take building text hierarchies as finding spanning trees in hyperbolic spaces.
arXiv Detail & Related papers (2024-03-25T09:04:14Z)
TRIGO: Benchmarking Formal Mathematical Proof Reduction for Generative Language Models [68.65075559137608]
We propose TRIGO, an ATP benchmark that not only requires a model to reduce a trigonometric expression with step-by-step proofs but also evaluates a generative LM's reasoning ability on formulas. We gather trigonometric expressions and their reduced forms from the web, annotate the simplification process manually, and translate it into the Lean formal language system. We develop an automatic generator based on Lean-Gym to create dataset splits of varying difficulties and distributions in order to thoroughly analyze the model's generalization ability.
arXiv Detail & Related papers (2023-10-16T08:42:39Z)
Proof Artifact Co-training for Theorem Proving with Language Models [4.934817254755007]
PACT (bf Proof bf Artifact bf Co-bf Training) is a general methodology for extracting self-supervised data from kernel-level proof terms for co-training. We instrument Lean with a neural theorem prover driven by a Transformer language model and show that PACT improves theorem proving success rate on a held-out suite of test theorems from 32% to 48%.
arXiv Detail & Related papers (2021-02-11T18:59:24Z)
Generative Language Modeling for Automated Theorem Proving [94.01137612934842]
This work is motivated by the possibility that a major limitation of automated theorem provers compared to humans might be addressable via generation from language models. We present an automated prover and proof assistant, GPT-f, for the Metamath formalization language, and analyze its performance.
arXiv Detail & Related papers (2020-09-07T19:50:10Z)
Text Modular Networks: Learning to Decompose Tasks in the Language of Existing Models [61.480085460269514]
We propose a framework for building interpretable systems that learn to solve complex tasks by decomposing them into simpler ones solvable by existing models. We use this framework to build ModularQA, a system that can answer multi-hop reasoning questions by decomposing them into sub-questions answerable by a neural factoid single-span QA model and a symbolic calculator.
arXiv Detail & Related papers (2020-09-01T23:45:42Z)

This list is automatically generated from the titles and abstracts of the papers in this site.