Generating Natural Language Proofs with Verifier-Guided Search
- URL: http://arxiv.org/abs/2205.12443v1
- Date: Wed, 25 May 2022 02:22:30 GMT
- Title: Generating Natural Language Proofs with Verifier-Guided Search
- Authors: Kaiyu Yang and Jia Deng and Danqi Chen
- Abstract summary: We present a novel stepwise method NLProofS (Natural Language Proof Search)
NLProofS learns to generate relevant steps conditioning on the hypothesis.
It achieves state-of-the-art performance on EntailmentBank and RuleTaker.
- Score: 74.9614610172561
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Deductive reasoning (drawing conclusions from assumptions) is a challenging
problem in NLP. In this work, we focus on proof generation: given a hypothesis
and a set of supporting facts in natural language, the model generates a proof
tree indicating how to deduce the hypothesis from supporting facts. Instead of
generating the entire proof in one shot, prior work has demonstrated the
promise of stepwise generation but achieved limited success on real-world data.
Existing stepwise methods struggle to generate proof steps that are both valid
and relevant. In this paper, we present a novel stepwise method NLProofS
(Natural Language Proof Search), which learns to generate relevant steps
conditioning on the hypothesis. At the core of our approach, we train an
independent verifier to check the validity of proof steps. Instead of
generating steps greedily, we search for proofs maximizing a global proof score
judged by the verifier. NLProofS achieves state-of-the-art performance on
EntailmentBank and RuleTaker. For example, it improves the percentage of
correctly predicted proofs from 20.9% to 33.3% in the distractor setting of
EntailmentBank. This is the first time stepwise methods have led to better
generation of challenging human-authored proofs.
Related papers
- Next-Token Prediction Task Assumes Optimal Data Ordering for LLM Training in Proof Generation [27.60611509339328]
We argue that the optimal order for one training data sample occurs when the relevant intermediate supervision for a particular proof step is always positioned to the left of that proof step.
We demonstrate that training is most effective when the proof is in the intuitively sequential order.
arXiv Detail & Related papers (2024-10-30T18:00:04Z) - Lean-STaR: Learning to Interleave Thinking and Proving [53.923617816215774]
We present Lean-STaR, a framework for training language models to produce informal thoughts prior to each step of a proof.
Lean-STaR achieves state-of-the-art results on the miniF2F-test benchmark within the Lean theorem proving environment.
arXiv Detail & Related papers (2024-07-14T01:43:07Z) - Proving Theorems Recursively [80.42431358105482]
We propose POETRY, which proves theorems in a level-by-level manner.
Unlike previous step-by-step methods, POETRY searches for a sketch of the proof at each level.
We observe a substantial increase in the maximum proof length found by POETRY, from 10 to 26.
arXiv Detail & Related papers (2024-05-23T10:35:08Z) - Learn from Failure: Fine-Tuning LLMs with Trial-and-Error Data for Intuitionistic Propositional Logic Proving [41.23045212775232]
We demonstrate the benefit of training models that additionally learn from failed search paths.
Facing the lack of such trial-and-error data in existing open-source theorem-proving datasets, we curate a dataset on intuitionistic propositional logic theorems.
We compare our model trained on relatively short trial-and-error information (TrialMaster) with models trained only on the correct paths and discover that the former solves more unseen theorems with lower trial searches.
arXiv Detail & Related papers (2024-04-10T23:01:45Z) - Baldur: Whole-Proof Generation and Repair with Large Language Models [8.100054850290507]
We use large language models, trained on natural language text and code and fine-tuned on proofs, to generate whole proofs for theorems at once.
We combine this proof generation model with a fine-tuned repair model to repair generated proofs, further increasing proving power.
We evaluate our method in a prototype, Baldur, and evaluate it on a benchmark of 6,336 Isabelle/HOL theorems and their proofs.
arXiv Detail & Related papers (2023-03-08T22:00:15Z) - Natural Language Deduction with Incomplete Information [43.93269297653265]
We propose a new system that can handle the underspecified setting where not all premises are stated at the outset.
By using a natural language generation model to abductively infer a premise given another premise and a conclusion, we can impute missing pieces of evidence needed for the conclusion to be true.
arXiv Detail & Related papers (2022-11-01T17:27:55Z) - NaturalProver: Grounded Mathematical Proof Generation with Language
Models [84.2064569475095]
Theorem proving in natural mathematical language plays a central role in mathematical advances and education.
We develop NaturalProver, a language model that generates proofs by conditioning on background references.
NaturalProver is capable of proving some theorems that require short (2-6 step) proofs, and providing next-step suggestions that are rated as correct and useful over 40% of the time.
arXiv Detail & Related papers (2022-05-25T17:01:18Z) - multiPRover: Generating Multiple Proofs for Improved Interpretability in
Rule Reasoning [73.09791959325204]
We focus on a type of linguistic formal reasoning where the goal is to reason over explicit knowledge in the form of natural language facts and rules.
A recent work, named PRover, performs such reasoning by answering a question and also generating a proof graph that explains the answer.
In our work, we address a new and challenging problem of generating multiple proof graphs for reasoning over natural language rule-bases.
arXiv Detail & Related papers (2021-06-02T17:58:35Z) - ProofWriter: Generating Implications, Proofs, and Abductive Statements
over Natural Language [19.917022148887273]
Transformers have been shown to emulate logical deduction over natural language theories.
We show that a generative model, called ProofWriter, can reliably generate both implications of a theory and the natural language proof(s) that support them.
arXiv Detail & Related papers (2020-12-24T00:55:46Z) - PRover: Proof Generation for Interpretable Reasoning over Rules [81.40404921232192]
We propose a transformer-based model that answers binary questions over rule-bases and generates the corresponding proofs.
Our model learns to predict nodes and edges corresponding to proof graphs in an efficient constrained training paradigm.
We conduct experiments on synthetic, hand-authored, and human-paraphrased rule-bases to show promising results for QA and proof generation.
arXiv Detail & Related papers (2020-10-06T15:47:53Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.