Related papers: NaturalProver: Grounded Mathematical Proof Generation with Language Models

NaturalProver: Grounded Mathematical Proof Generation with Language Models

URL: http://arxiv.org/abs/2205.12910v1
Date: Wed, 25 May 2022 17:01:18 GMT
Title: NaturalProver: Grounded Mathematical Proof Generation with Language Models
Authors: Sean Welleck, Jiacheng Liu, Ximing Lu, Hannaneh Hajishirzi, Yejin Choi
Abstract summary: Theorem proving in natural mathematical language plays a central role in mathematical advances and education. We develop NaturalProver, a language model that generates proofs by conditioning on background references. NaturalProver is capable of proving some theorems that require short (2-6 step) proofs, and providing next-step suggestions that are rated as correct and useful over 40% of the time.
Score: 84.2064569475095
License: http://creativecommons.org/licenses/by/4.0/
Abstract: Theorem proving in natural mathematical language - the mixture of symbolic and natural language used by humans - plays a central role in mathematical advances and education, and tests aspects of reasoning that are core to intelligence. Yet it has remained underexplored with modern generative models. We study large-scale language models on two new generation tasks: suggesting the next step in a mathematical proof, and full proof generation. Naively applying language models to these problems yields proofs riddled with hallucinations and logical incoherence. We develop NaturalProver, a language model that generates proofs by conditioning on background references (e.g. theorems and definitions that are either retrieved or human-provided), and optionally enforces their presence with constrained decoding. On theorems from the NaturalProofs benchmark, NaturalProver improves the quality of next-step suggestions and generated proofs over fine-tuned GPT-3, according to human evaluations from university-level mathematics students. NaturalProver is capable of proving some theorems that require short (2-6 step) proofs, and providing next-step suggestions that are rated as correct and useful over 40% of the time, which is to our knowledge the first demonstration of these capabilities using neural language models.

Related papers

Safe: Enhancing Mathematical Reasoning in Large Language Models via Retrospective Step-aware Formal Verification [56.218970738892764]
Chain-of-Thought prompting has become the de facto method to elicit reasoning capabilities from large language models (LLMs)<n>To mitigate hallucinations in CoT that are notoriously difficult to detect, current methods operate as opaque boxes and do not provide checkable evidence for their judgments, possibly limiting their effectiveness.<n>We propose a retrospective, step-aware formal verification framework $Safe$. Rather than assigning arbitrary scores, we strive to articulate mathematical claims in formal mathematical language Lean 4 at each reasoning step and provide formal proofs to identify hallucinations.
arXiv Detail & Related papers (2025-06-05T03:16:08Z)
Lean-STaR: Learning to Interleave Thinking and Proving [53.923617816215774]
We present Lean-STaR, a framework for training language models to produce informal thoughts prior to each step of a proof. Lean-STaR achieves state-of-the-art results on the miniF2F-test benchmark within the Lean theorem proving environment.
arXiv Detail & Related papers (2024-07-14T01:43:07Z)
Language Models as Inductive Reasoners [125.99461874008703]
We propose a new paradigm (task) for inductive reasoning, which is to induce natural language rules from natural language facts. We create a dataset termed DEER containing 1.2k rule-fact pairs for the task, where rules and facts are written in natural language. We provide the first and comprehensive analysis of how well pretrained language models can induce natural language rules from natural language facts.
arXiv Detail & Related papers (2022-12-21T11:12:14Z)
Towards a Mathematics Formalisation Assistant using Large Language Models [5.485439959027125]
We explore the abilities of a large language model (Codex) to help with formalisation in the Lean theorem prover. Codex is able to formalise short mathematical statements at undergrad level with nearly 75% accuracy for $120$ theorem statements. We show that with a new prompting strategy Codex can formalise these proofs in natural language with at least one out of twelve Codex completion being easy to repair into a complete proof.
arXiv Detail & Related papers (2022-11-14T16:52:32Z)
Generating Natural Language Proofs with Verifier-Guided Search [74.9614610172561]
We present a novel stepwise method NLProofS (Natural Language Proof Search) NLProofS learns to generate relevant steps conditioning on the hypothesis. It achieves state-of-the-art performance on EntailmentBank and RuleTaker.
arXiv Detail & Related papers (2022-05-25T02:22:30Z)
NaturalProofs: Mathematical Theorem Proving in Natural Language [132.99913141409968]
We develop NaturalProofs, a multi-domain corpus of mathematical statements and their proofs. NaturalProofs unifies broad coverage, deep coverage, and low-resource mathematical sources. We benchmark strong neural methods on mathematical reference retrieval and generation tasks.
arXiv Detail & Related papers (2021-03-24T03:14:48Z)
ProofWriter: Generating Implications, Proofs, and Abductive Statements over Natural Language [19.917022148887273]
Transformers have been shown to emulate logical deduction over natural language theories. We show that a generative model, called ProofWriter, can reliably generate both implications of a theory and the natural language proof(s) that support them.
arXiv Detail & Related papers (2020-12-24T00:55:46Z)
Generative Language Modeling for Automated Theorem Proving [94.01137612934842]
This work is motivated by the possibility that a major limitation of automated theorem provers compared to humans might be addressable via generation from language models. We present an automated prover and proof assistant, GPT-f, for the Metamath formalization language, and analyze its performance.
arXiv Detail & Related papers (2020-09-07T19:50:10Z)

This list is automatically generated from the titles and abstracts of the papers in this site.

This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.