math-PVS: A Large Language Model Framework to Map Scientific
Publications to PVS Theories
- URL: http://arxiv.org/abs/2310.17064v1
- Date: Wed, 25 Oct 2023 23:54:04 GMT
- Title: math-PVS: A Large Language Model Framework to Map Scientific
Publications to PVS Theories
- Authors: Hassen Saidi, Susmit Jha, Tuhin Sahai
- Abstract summary: This work investigates the applicability of large language models (LLMs) in formalizing advanced mathematical concepts.
We envision an automated process, called emphmath-PVS, to extract and formalize mathematical theorems from research papers.
- Score: 10.416375584563728
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: As artificial intelligence (AI) gains greater adoption in a wide variety of
applications, it has immense potential to contribute to mathematical discovery,
by guiding conjecture generation, constructing counterexamples, assisting in
formalizing mathematics, and discovering connections between different
mathematical areas, to name a few.
While prior work has leveraged computers for exhaustive mathematical proof
search, recent efforts based on large language models (LLMs) aspire to position
computing platforms as co-contributors in the mathematical research process.
Despite their current limitations in logic and mathematical tasks, there is
growing interest in melding theorem proving systems with foundation models.
This work investigates the applicability of LLMs in formalizing advanced
mathematical concepts and proposes a framework that can critically review and
check mathematical reasoning in research papers. Given the noted reasoning
shortcomings of LLMs, our approach synergizes the capabilities of proof
assistants, specifically PVS, with LLMs, enabling a bridge between textual
descriptions in academic papers and formal specifications in PVS. By harnessing
the PVS environment, coupled with data ingestion and conversion mechanisms, we
envision an automated process, called \emph{math-PVS}, to extract and formalize
mathematical theorems from research papers, offering an innovative tool for
academic review and discovery.
Related papers
- LeanAgent: Lifelong Learning for Formal Theorem Proving [85.39415834798385]
We present LeanAgent, a novel lifelong learning framework for formal theorem proving.
LeanAgent continuously generalizes to and improves on ever-expanding mathematical knowledge.
It successfully proves 155 theorems previously unproved formally by humans across 23 diverse Lean repositories.
arXiv Detail & Related papers (2024-10-08T17:11:24Z) - Mathematical Formalized Problem Solving and Theorem Proving in Different Fields in Lean 4 [0.0]
This paper explores the use of Large Language Models (LLMs) to generate formal proof steps and complete formalized proofs.
The goal is to determine how AI can be leveraged to assist the mathematical formalization process and improve its performance.
arXiv Detail & Related papers (2024-09-09T18:21:28Z) - MathBench: Evaluating the Theory and Application Proficiency of LLMs with a Hierarchical Mathematics Benchmark [82.64129627675123]
MathBench is a new benchmark that rigorously assesses the mathematical capabilities of large language models.
MathBench spans a wide range of mathematical disciplines, offering a detailed evaluation of both theoretical understanding and practical problem-solving skills.
arXiv Detail & Related papers (2024-05-20T17:52:29Z) - Evaluating LLMs' Mathematical Reasoning in Financial Document Question
Answering [53.56653281752486]
This study explores Large Language Models' mathematical reasoning on four financial question-answering datasets.
We focus on sensitivity to table complexity and performance variations with an increasing number of arithmetic reasoning steps.
We introduce a novel prompting technique tailored to semi-structured documents, matching or outperforming other baselines in performance.
arXiv Detail & Related papers (2024-02-17T05:10:18Z) - A New Approach Towards Autoformalization [7.275550401145199]
Autoformalization is the task of translating natural language mathematics into a formal language that can be verified by a program.
Research paper mathematics requires large amounts of background and context.
We propose an avenue towards tackling autoformalization for research-level mathematics, by breaking the task into easier and more approachable subtasks.
arXiv Detail & Related papers (2023-10-12T00:50:24Z) - ChatGPT for Computational Topology [10.770019251470583]
ChatGPT represents a significant milestone in the field of artificial intelligence.
This work endeavors to bridge the gap between theoretical topological concepts and their practical implementation in computational topology.
arXiv Detail & Related papers (2023-10-11T15:10:07Z) - Evaluating Language Models for Mathematics through Interactions [116.67206980096513]
We introduce CheckMate, a prototype platform for humans to interact with and evaluate large language models (LLMs)
We conduct a study with CheckMate to evaluate three language models (InstructGPT, ChatGPT, and GPT-4) as assistants in proving undergraduate-level mathematics.
We derive a taxonomy of human behaviours and uncover that despite a generally positive correlation, there are notable instances of divergence between correctness and perceived helpfulness.
arXiv Detail & Related papers (2023-06-02T17:12:25Z) - A Survey of Deep Learning for Mathematical Reasoning [71.88150173381153]
We review the key tasks, datasets, and methods at the intersection of mathematical reasoning and deep learning over the past decade.
Recent advances in large-scale neural language models have opened up new benchmarks and opportunities to use deep learning for mathematical reasoning.
arXiv Detail & Related papers (2022-12-20T18:46:16Z) - Generative Language Modeling for Automated Theorem Proving [94.01137612934842]
This work is motivated by the possibility that a major limitation of automated theorem provers compared to humans might be addressable via generation from language models.
We present an automated prover and proof assistant, GPT-f, for the Metamath formalization language, and analyze its performance.
arXiv Detail & Related papers (2020-09-07T19:50:10Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.