Lyra: Orchestrating Dual Correction in Automated Theorem Proving
- URL: http://arxiv.org/abs/2309.15806v3
- Date: Sat, 7 Oct 2023 07:54:19 GMT
- Title: Lyra: Orchestrating Dual Correction in Automated Theorem Proving
- Authors: Chuanyang Zheng, Haiming Wang, Enze Xie, Zhengying Liu, Jiankai Sun,
Huajian Xin, Jianhao Shen, Zhenguo Li, Yu Li
- Abstract summary: Lyra is a new framework that employs two distinct correction mechanisms: Tool Correction and Conjecture Correction.
Tool Correction contributes to mitigating hallucinations, thereby improving the overall accuracy of the proof.
Conjecture Correction refines generation with instruction but does not collect paired (generation, error & refinement) prompts.
- Score: 65.48893396979807
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Large Language Models (LLMs) present an intriguing avenue for exploration in
the field of formal theorem proving. Nevertheless, their full potential,
particularly concerning the mitigation of hallucinations and refinement through
prover error messages, remains an area that has yet to be thoroughly
investigated. To enhance the effectiveness of LLMs in the field, we introduce
the Lyra, a new framework that employs two distinct correction mechanisms: Tool
Correction (TC) and Conjecture Correction (CC). To implement Tool Correction in
the post-processing of formal proofs, we leverage prior knowledge to utilize
predefined prover tools (e.g., Sledgehammer) for guiding the replacement of
incorrect tools. Tool Correction significantly contributes to mitigating
hallucinations, thereby improving the overall accuracy of the proof. In
addition, we introduce Conjecture Correction, an error feedback mechanism
designed to interact with prover to refine formal proof conjectures with prover
error messages. Compared to the previous refinement framework, the proposed
Conjecture Correction refines generation with instruction but does not collect
paired (generation, error & refinement) prompts. Our method has achieved
state-of-the-art (SOTA) performance on both miniF2F validation (48.0% -> 55.3%)
and test (45.5% -> 51.2%). We also present 3 IMO problems solved by Lyra. We
believe Tool Correction (post-process for hallucination mitigation) and
Conjecture Correction (subgoal adjustment from interaction with environment)
could provide a promising avenue for future research in this field.
Related papers
- Small Language Models Need Strong Verifiers to Self-Correct Reasoning [69.94251699982388]
Self-correction has emerged as a promising solution to boost the reasoning performance of large language models (LLMs)
This work explores whether small (= 13B) language models (LMs) have the ability of self-correction on reasoning tasks with minimal inputs from stronger LMs.
arXiv Detail & Related papers (2024-04-26T03:41:28Z) - GenAudit: Fixing Factual Errors in Language Model Outputs with Evidence [64.95492752484171]
We present GenAudit -- a tool intended to assist fact-checking LLM responses for document-grounded tasks.
We train models to execute these tasks, and design an interactive interface to present suggested edits and evidence to users.
To ensure that most errors are flagged by the system, we propose a method that can increase the error recall while minimizing impact on precision.
arXiv Detail & Related papers (2024-02-19T21:45:55Z) - Diagnosing and Rectifying Fake OOD Invariance: A Restructured Causal
Approach [51.012396632595554]
Invariant representation learning (IRL) encourages the prediction from invariant causal features to labels de-confounded from the environments.
Recent theoretical results verified that some causal features recovered by IRLs merely pretend domain-invariantly in the training environments but fail in unseen domains.
We develop an approach based on conditional mutual information with respect to RS-SCM, then rigorously rectify the spurious and fake invariant effects.
arXiv Detail & Related papers (2023-12-15T12:58:05Z) - Towards Calibrated Robust Fine-Tuning of Vision-Language Models [97.19901765814431]
This work proposes a robust fine-tuning method that improves both OOD accuracy and calibration error in Vision Language Models (VLMs)
Based on this insight, we design a novel framework that conducts fine-tuning with a constrained multimodal contrastive loss enforcing a larger smallest singular value.
arXiv Detail & Related papers (2023-11-03T05:41:25Z) - Converge to the Truth: Factual Error Correction via Iterative
Constrained Editing [30.740281040892086]
We propose VENCE, a novel method for factual error correction (FEC) with minimal edits.
VENCE formulates the FEC problem as iterative sampling editing actions with respect to a target density function.
Experiments on a public dataset show that VENCE improves the well-adopted SARI metric by 5.3 (or a relative improvement of 11.8%) over the previous best distantly-supervised methods.
arXiv Detail & Related papers (2022-11-22T10:03:13Z) - Factual Error Correction for Abstractive Summaries Using Entity
Retrieval [57.01193722520597]
We propose an efficient factual error correction system RFEC based on entities retrieval post-editing process.
RFEC retrieves the evidence sentences from the original document by comparing the sentences with the target summary.
Next, RFEC detects the entity-level errors in the summaries by considering the evidence sentences and substitutes the wrong entities with the accurate entities from the evidence sentences.
arXiv Detail & Related papers (2022-04-18T11:35:02Z) - Evidence-based Factual Error Correction [18.52583883901634]
This paper introduces the task of factual error correction.
It provides a mechanism to correct written texts that are refuted or only partially supported by evidence.
arXiv Detail & Related papers (2021-06-02T11:00:17Z) - Factual Error Correction of Claims [18.52583883901634]
This paper introduces the task of factual error correction.
It provides a mechanism to correct written texts that contain misinformation.
It acts as an inherent explanation for claims already partially supported by evidence.
arXiv Detail & Related papers (2020-12-31T18:11:26Z) - Error correction and extraction in request dialogs [12.137183622356197]
Component gets the last two utterances of a user and can detect whether the last utterance is an error correction of the second last utterance.
It corrects the second last utterance according to the error correction in the last utterance and outputs the extracted pairs of reparandum and repair entity.
One error correction detection and one error correction approach can be combined to a pipeline or the error correction approaches can be trained and used end-to-end to avoid two components.
arXiv Detail & Related papers (2020-04-08T20:49:10Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.