Experimental results from applying GPT-4 to an unpublished formal
language
- URL: http://arxiv.org/abs/2305.12196v1
- Date: Sat, 20 May 2023 14:00:08 GMT
- Title: Experimental results from applying GPT-4 to an unpublished formal
language
- Authors: Gregor vom Scheidt
- Abstract summary: A state-of-the-art system, GPT-4, was provided with a concise natural language specification for a previously unpublished formal system.
The system completed all tasks successfully, showed extensive domain knowledge, invented helpful new syntax and semantics, and exhibited generalization and inference abilities.
- Score: 0.0
- License: http://creativecommons.org/licenses/by-nc-sa/4.0/
- Abstract: Can large language models be used to complete mathematical tasks that are
traditionally performed either manually or with the aid of theorem provers? To
answer this question, a state-of-the-art system, GPT-4, was provided with a
concise natural language specification for a previously unpublished formal
system and asked to complete a number of tasks, from stating function and type
definitions to proving simple theorems and verifying user-supplied proofs. The
system completed all tasks successfully, showed extensive domain knowledge,
invented helpful new syntax and semantics, and exhibited generalization and
inference abilities. So the answer seems to be: yes.
Related papers
- Lean-STaR: Learning to Interleave Thinking and Proving [53.923617816215774]
We present Lean-STaR, a framework for training language models to produce informal thoughts prior to each step of a proof.
Lean-STaR achieves state-of-the-art results on the miniF2F-test benchmark within the Lean theorem proving environment.
arXiv Detail & Related papers (2024-07-14T01:43:07Z) - Autoformalizing Euclidean Geometry [74.72212706513318]
We introduce a neuro-symbolic framework for autoformalizing Euclidean geometry.
One challenge is that informal proofs rely on diagrams, leaving gaps in texts that are hard to formalize.
We provide automatic semantic evaluation for autoformalized theorem statements.
arXiv Detail & Related papers (2024-05-27T14:35:10Z) - Large Language Models' Understanding of Math: Source Criticism and
Extrapolation [0.0]
We evaluate the mathematical understanding of the GPT-4 model.
It is hard to find scientific evidence suggesting GPT-4 has acquired an understanding of even basic mathematical concepts.
arXiv Detail & Related papers (2023-11-12T07:52:32Z) - A New Approach Towards Autoformalization [7.275550401145199]
Autoformalization is the task of translating natural language mathematics into a formal language that can be verified by a program.
Research paper mathematics requires large amounts of background and context.
We propose an avenue towards tackling autoformalization for research-level mathematics, by breaking the task into easier and more approachable subtasks.
arXiv Detail & Related papers (2023-10-12T00:50:24Z) - FIMO: A Challenge Formal Dataset for Automated Theorem Proving [31.695624833932577]
FIMO is designed to facilitate advanced automated theorem proving at the IMO level.
It comprises 149 formal problem statements, accompanied by both informal problem descriptions and their corresponding-based informal proofs.
arXiv Detail & Related papers (2023-09-08T12:34:28Z) - Lila: A Unified Benchmark for Mathematical Reasoning [59.97570380432861]
LILA is a unified mathematical reasoning benchmark consisting of 23 diverse tasks along four dimensions.
We construct our benchmark by extending 20 datasets benchmark by collecting task instructions and solutions in the form of Python programs.
We introduce BHASKARA, a general-purpose mathematical reasoning model trained on LILA.
arXiv Detail & Related papers (2022-10-31T17:41:26Z) - NaturalProver: Grounded Mathematical Proof Generation with Language
Models [84.2064569475095]
Theorem proving in natural mathematical language plays a central role in mathematical advances and education.
We develop NaturalProver, a language model that generates proofs by conditioning on background references.
NaturalProver is capable of proving some theorems that require short (2-6 step) proofs, and providing next-step suggestions that are rated as correct and useful over 40% of the time.
arXiv Detail & Related papers (2022-05-25T17:01:18Z) - Prove-It: A Proof Assistant for Organizing and Verifying General
Mathematical Knowledge [0.0]
We introduce Prove-It, a Python-based general-purpose interactive theorem-proving assistant.
Prove-It uses a highly-flexible Jupyter notebook-based user interface that documents interactions and proof steps.
Current development and future work includes promising applications to quantum circuit manipulation and quantum algorithm verification.
arXiv Detail & Related papers (2020-12-20T18:15:12Z) - Generative Language Modeling for Automated Theorem Proving [94.01137612934842]
This work is motivated by the possibility that a major limitation of automated theorem provers compared to humans might be addressable via generation from language models.
We present an automated prover and proof assistant, GPT-f, for the Metamath formalization language, and analyze its performance.
arXiv Detail & Related papers (2020-09-07T19:50:10Z) - Learning to Prove Theorems by Learning to Generate Theorems [71.46963489866596]
We learn a neural generator that automatically synthesizes theorems and proofs for the purpose of training a theorem prover.
Experiments on real-world tasks demonstrate that synthetic data from our approach improves the theorem prover.
arXiv Detail & Related papers (2020-02-17T16:06:02Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.