CoqPilot, a plugin for LLM-based generation of proofs
- URL: http://arxiv.org/abs/2410.19605v1
- Date: Fri, 25 Oct 2024 14:57:29 GMT
- Title: CoqPilot, a plugin for LLM-based generation of proofs
- Authors: Andrei Kozyrev, Gleb Solovev, Nikita Khramov, Anton Podkopaev,
- Abstract summary: CoqPilot is a VS Code extension designed to help automate writing of Coq proofs.
The plugin collects the parts of proofs marked with the admit tactic in a Coq file.
It combines LLMs along with non-machine-learning methods to generate proof candidates for the holes.
- Score: 0.0
- License:
- Abstract: We present CoqPilot, a VS Code extension designed to help automate writing of Coq proofs. The plugin collects the parts of proofs marked with the admit tactic in a Coq file, i.e., proof holes, and combines LLMs along with non-machine-learning methods to generate proof candidates for the holes. Then, CoqPilot checks if each proof candidate solves the given subgoal and, if successful, replaces the hole with it. The focus of CoqPilot is twofold. Firstly, we want to allow users to seamlessly combine multiple Coq generation approaches and provide a zero-setup experience for our tool. Secondly, we want to deliver a platform for LLM-based experiments on Coq proof generation. We developed a benchmarking system for Coq generation methods, available in the plugin, and conducted an experiment using it, showcasing the framework's possibilities. Demo of CoqPilot is available at: https://youtu.be/oB1Lx-So9Lo. Code at: https://github.com/JetBrains-Research/coqpilot
Related papers
- Benchmarking Uncertainty Quantification Methods for Large Language Models with LM-Polygraph [83.90988015005934]
Uncertainty quantification (UQ) is a critical component of machine learning (ML) applications.
We introduce a novel benchmark that implements a collection of state-of-the-art UQ baselines.
We conduct a large-scale empirical investigation of UQ and normalization techniques across nine tasks, and identify the most promising approaches.
arXiv Detail & Related papers (2024-06-21T20:06:31Z) - CoqPyt: Proof Navigation in Python in the Era of LLMs [5.029445580644576]
This paper presents CoqPyt, a Python tool for interacting with the Coq proof assistant.
CoqPyt improves on other Coq-related tools by providing novel features, such as the extraction of rich premise data.
arXiv Detail & Related papers (2024-05-07T12:50:28Z) - Towards Large Language Models as Copilots for Theorem Proving in Lean [81.94024084598598]
We introduce Lean Copilot, a framework for running Lean inference in large language models.
We build tools for suggesting proof steps, completing intermediate proof goals, and selecting relevant premises.
Experimental results demonstrate the effectiveness of our method in assisting humans and theorem proving process.
arXiv Detail & Related papers (2024-04-18T22:54:08Z) - Enhancing Formal Theorem Proving: A Comprehensive Dataset for Training AI Models on Coq Code [0.0]
The Coq proof assistant stands out for its rigorous approach to verifying mathematical assertions and software correctness.
Despite the advances in artificial intelligence and machine learning, the specialized nature of Coq syntax and semantics poses unique challenges for Large Language Models (LLMs)
This dataset, derived from a collection of over 10,000 Coq source files, encompasses a wide array of propositions, proofs, and definitions.
arXiv Detail & Related papers (2024-03-19T10:53:40Z) - Towards Automatic Transformations of Coq Proof Scripts [0.0]
We present a framework to carry out a posteriori script transformations.
These transformations are meant to be applied as an automated post-processing step, once the proof has been completed.
We apply our tool to various Coq proof scripts, including some from the GeoCoq library.
arXiv Detail & Related papers (2024-01-22T12:48:34Z) - Merging Generated and Retrieved Knowledge for Open-Domain QA [72.42262579925911]
COMBO is a compatibility-Oriented knowledge Merging for Better Open-domain QA framework.
We show that COMBO outperforms competitive baselines on three out of four tested open-domain QA benchmarks.
arXiv Detail & Related papers (2023-10-22T19:37:06Z) - A Practical Toolkit for Multilingual Question and Answer Generation [79.31199020420827]
We introduce AutoQG, an online service for multilingual QAG, along with lmqg, an all-in-one Python package for model fine-tuning, generation, and evaluation.
We also release QAG models in eight languages fine-tuned on a few variants of pre-trained encoder-decoder language models, which can be used online via AutoQG or locally via lmqg.
arXiv Detail & Related papers (2023-05-27T08:42:37Z) - Open-domain Question Answering via Chain of Reasoning over Heterogeneous
Knowledge [82.5582220249183]
We propose a novel open-domain question answering (ODQA) framework for answering single/multi-hop questions across heterogeneous knowledge sources.
Unlike previous methods that solely rely on the retriever for gathering all evidence in isolation, our intermediary performs a chain of reasoning over the retrieved set.
Our system achieves competitive performance on two ODQA datasets, OTT-QA and NQ, against tables and passages from Wikipedia.
arXiv Detail & Related papers (2022-10-22T03:21:32Z) - Multi-hop Question Generation with Graph Convolutional Network [58.31752179830959]
Multi-hop Question Generation (QG) aims to generate answer-related questions by aggregating and reasoning over multiple scattered evidence from different paragraphs.
We propose Multi-Hop volution Fusion Network for Question Generation (MulQG), which does context encoding in multiple hops.
Our proposed model is able to generate fluent questions with high completeness and outperforms the strongest baseline by 20.8% in the multi-hop evaluation.
arXiv Detail & Related papers (2020-10-19T06:15:36Z) - Learning to Format Coq Code Using Language Models [35.21093297227429]
Coq code tends to be written in distinct manners by different people and teams.
In particular, even inexperienced users can distinguish vernacular using the standard library and plain Ltac.
Rule-based formatters, such as Coq's beautifier, have limited flexibility and only capture small fractions of desired conventions.
arXiv Detail & Related papers (2020-06-18T14:46:15Z) - Prolog Technology Reinforcement Learning Prover [0.6445605125467572]
The core of the toolkit is a compact and easy to extend Prolog-based automated theorem prover called plCoP.
plCoP builds on the leanCoP Prolog implementation and adds learning-guided Monte-Carlo Tree Search as done in the rlCoP system.
Other components include a Python interface to plCoP and machine learners, and an external proof checker that verifies the validity of plCoP proofs.
arXiv Detail & Related papers (2020-04-15T10:52:04Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.