Enhancing Formal Theorem Proving: A Comprehensive Dataset for Training AI Models on Coq Code
- URL: http://arxiv.org/abs/2403.12627v2
- Date: Tue, 2 Apr 2024 13:54:47 GMT
- Title: Enhancing Formal Theorem Proving: A Comprehensive Dataset for Training AI Models on Coq Code
- Authors: Andreas Florath,
- Abstract summary: The Coq proof assistant stands out for its rigorous approach to verifying mathematical assertions and software correctness.
Despite the advances in artificial intelligence and machine learning, the specialized nature of Coq syntax and semantics poses unique challenges for Large Language Models (LLMs)
This dataset, derived from a collection of over 10,000 Coq source files, encompasses a wide array of propositions, proofs, and definitions.
- Score: 0.0
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: In the realm of formal theorem proving, the Coq proof assistant stands out for its rigorous approach to verifying mathematical assertions and software correctness. Despite the advances in artificial intelligence and machine learning, the specialized nature of Coq syntax and semantics poses unique challenges for Large Language Models (LLMs). Addressing this gap, we present a comprehensive dataset specifically designed to enhance LLMs' proficiency in interpreting and generating Coq code. This dataset, derived from a collection of over 10,000 Coq source files, encompasses a wide array of propositions, proofs, and definitions, enriched with metadata including source references and licensing information. Our primary aim is to facilitate the development of LLMs capable of generating syntactically correct and semantically meaningful Coq constructs, thereby advancing the frontier of automated theorem proving. Initial experiments with this dataset have showcased its significant potential; models trained on this data exhibited enhanced accuracy in Coq code generation. Notably, a particular experiment revealed that a fine-tuned LLM was capable of generating 141 valid proofs for a basic lemma, highlighting the dataset's utility in facilitating the discovery of diverse and valid proof strategies. This paper discusses the dataset's composition, the methodology behind its creation, and the implications of our findings for the future of machine learning in formal verification. The dataset is accessible for further research and exploration: https://huggingface.co/datasets/florath/coq-facts-props-proofs-gen0-v1
Related papers
- Large Language Models Meet Symbolic Provers for Logical Reasoning Evaluation [24.081573908824353]
First-order logic (FOL) reasoning is pivotal for intelligent systems.
Existing benchmarks often rely on extensive human annotation or handcrafted templates.
We propose a novel framework called ProverGen that synergizes the generative strengths of Large Language Models with the rigor and precision of symbolic provers.
arXiv Detail & Related papers (2025-02-10T15:31:54Z) - Data for Mathematical Copilots: Better Ways of Presenting Proofs for Machine Learning [85.635988711588]
We argue that enhancing the capabilities of large language models requires a paradigm shift in the design of mathematical datasets.
We advocate for mathematical dataset developers to consider the concept of "motivated proof", introduced by G. P'olya in 1949, which can serve as a blueprint for datasets that offer a better proof learning signal.
We provide a questionnaire designed specifically for math datasets that we urge creators to include with their datasets.
arXiv Detail & Related papers (2024-12-19T18:55:17Z) - Rango: Adaptive Retrieval-Augmented Proving for Automated Software Verification [11.115201117143929]
We present Rango, a fully automated synthesis proof tool for Coq.
Rango identifies relevant premises and also similar proofs from the current project and uses them during synthesis.
Our evaluation shows that Rango adding relevant to its context leads to a 47% increase in the number of theorems proven.
arXiv Detail & Related papers (2024-12-18T17:08:42Z) - MAmmoTH-VL: Eliciting Multimodal Reasoning with Instruction Tuning at Scale [66.73529246309033]
multimodal large language models (MLLMs) have shown significant potential in a broad range of multimodal tasks.
Existing instruction-tuning datasets only provide phrase-level answers without any intermediate rationales.
We introduce a scalable and cost-effective method to construct a large-scale multimodal instruction-tuning dataset with rich intermediate rationales.
arXiv Detail & Related papers (2024-12-06T18:14:24Z) - OpenCoder: The Open Cookbook for Top-Tier Code Large Language Models [70.72097493954067]
Large language models (LLMs) for code have become indispensable in various domains, including code generation, reasoning tasks and agent systems.
While open-access code LLMs are increasingly approaching the performance levels of proprietary models, high-quality code LLMs remain limited.
We introduce OpenCoder, a top-tier code LLM that not only achieves performance comparable to leading models but also serves as an "open cookbook" for the research community.
arXiv Detail & Related papers (2024-11-07T17:47:25Z) - MUSTARD: Mastering Uniform Synthesis of Theorem and Proof Data [85.50740598523818]
MUSTARD is a framework that masters uniform synthesis of theorem and proof data of high quality and diversity.
We present a theorem-and-proof benchmark MUSTARDSAUCE with 5,866 valid data points.
We perform extensive analysis and demonstrate that MUSTARD generates validated high-quality step-by-step data.
arXiv Detail & Related papers (2024-02-14T05:57:58Z) - Disentanglement via Latent Quantization [60.37109712033694]
In this work, we construct an inductive bias towards encoding to and decoding from an organized latent space.
We demonstrate the broad applicability of this approach by adding it to both basic data-re (vanilla autoencoder) and latent-reconstructing (InfoGAN) generative models.
arXiv Detail & Related papers (2023-05-28T06:30:29Z) - Chain-of-Knowledge: Grounding Large Language Models via Dynamic
Knowledge Adapting over Heterogeneous Sources [87.26486246513063]
Chain-of-knowledge (CoK) is a framework that augments large language models.
CoK consists of three stages: reasoning preparation, dynamic knowledge adapting, and answer consolidation.
arXiv Detail & Related papers (2023-05-22T17:34:23Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.