Related papers: Formalizing Complex Mathematical Statements with LLMs: A Study on Mathematical Definitions

Formalizing Complex Mathematical Statements with LLMs: A Study on Mathematical Definitions

URL: http://arxiv.org/abs/2502.12065v1
Date: Mon, 17 Feb 2025 17:34:48 GMT
Title: Formalizing Complex Mathematical Statements with LLMs: A Study on Mathematical Definitions
Authors: Lan Zhang, Marco Valentino, Andre Freitas,
Abstract summary: We introduce two novel resources for autoformalization, collecting definitions from Wikipedia (Def_Wiki) and arXiv papers (Def_ArXiv)<n>We evaluate a range of LLMs, analyzing their ability to formalize definitions into Isabelle/HOL.<n>Our findings reveal that definitions present a greater challenge compared to existing benchmarks, such as miniF2F.
Score: 8.135142928659546
License: http://creativecommons.org/licenses/by/4.0/
Abstract: Thanks to their linguistic capabilities, LLMs offer an opportunity to bridge the gap between informal mathematics and formal languages through autoformalization. However, it is still unclear how well LLMs generalize to sophisticated and naturally occurring mathematical statements. To address this gap, we investigate the task of autoformalizing real-world mathematical definitions -- a critical component of mathematical discourse. Specifically, we introduce two novel resources for autoformalisation, collecting definitions from Wikipedia (Def_Wiki) and arXiv papers (Def_ArXiv). We then systematically evaluate a range of LLMs, analyzing their ability to formalize definitions into Isabelle/HOL. Furthermore, we investigate strategies to enhance LLMs' performance including refinement through external feedback from Proof Assistants, and formal definition grounding, where we guide LLMs through relevant contextual elements from formal mathematical libraries. Our findings reveal that definitions present a greater challenge compared to existing benchmarks, such as miniF2F. In particular, we found that LLMs still struggle with self-correction, and aligning with relevant mathematical libraries. At the same time, structured refinement methods and definition grounding strategies yield notable improvements of up to 16% on self-correction capabilities and 43% on the reduction of undefined errors, highlighting promising directions for enhancing LLM-based autoformalization in real-world scenarios.

Related papers

Sound and Complete Neurosymbolic Reasoning with LLM-Grounded Interpretations [7.81820080453498]
Large language models (LLMs) have demonstrated impressive capabilities in natural language understanding and generation.<n>We present a method for directly integrating an LLM into the interpretation function of the formal semantics for a paraconsistent logic.
arXiv Detail & Related papers (2025-07-13T19:05:43Z)
Evaluating Intermediate Reasoning of Code-Assisted Large Language Models for Mathematics [15.695635219034328]
We conduct an in-depth analysis of code-assisted LLMs generated programs in response to math reasoning tasks.<n>Our findings show that the capabilities of models significantly impact the logic implemented to solve the problem.
arXiv Detail & Related papers (2025-04-24T15:34:24Z)
The quasi-semantic competence of LLMs: a case study on the part-whole relation [53.37191762146552]
We investigate knowledge of the emphpart-whole relation, a.k.a. emphmeronymy.<n>We show that emphquasi-semantic'' models have just a emphquasi-semantic'' competence and still fall short of capturing deep inferential properties.
arXiv Detail & Related papers (2025-04-03T08:41:26Z)
Computation Mechanism Behind LLM Position Generalization [59.013857707250814]
Large language models (LLMs) exhibit flexibility in handling textual positions. They can understand texts with position perturbations and generalize to longer texts. This work connects the linguistic phenomenon with LLMs' computational mechanisms.
arXiv Detail & Related papers (2025-03-17T15:47:37Z)
Can LLMs Formally Reason as Abstract Interpreters for Program Analysis? [2.520295252080748]
LLMs have impressive capabilities in code generation and comprehension, but their potential in program analysis remains under-explored. We systematically investigate whether LLMs can reason about programs using a program analysis framework called abstract interpretation. We validate our approach using state-of-the-art LLMs on 22 challenging benchmark programs from the Software Verification Competition (SV-COMP) 2019 dataset. Our results show that our strategies are able to elicit abstract interpretation-based reasoning in the tested models, but LLMs are susceptible to logical errors.
arXiv Detail & Related papers (2025-03-16T23:05:52Z)
Proving Olympiad Inequalities by Synergizing LLMs and Symbolic Reasoning [27.562284768743694]
Large language models (LLMs) can prove mathematical theorems formally by generating proof steps within a proof system. We introduce a neuro-symbolic tactic generator that synergizes the mathematical intuition learned by LLMs with domain-specific insights encoded by symbolic methods. We evaluate our framework on 161 challenging inequalities from multiple mathematics competitions.
arXiv Detail & Related papers (2025-02-19T15:54:21Z)
One Example Shown, Many Concepts Known! Counterexample-Driven Conceptual Reasoning in Mathematical LLMs [57.48325300739872]
Leveraging mathematical Large Language Models for proof generation is a fundamental topic in LLMs research.<n>We argue that the ability of current LLMs to prove statements largely depends on whether they have encountered the relevant proof process during training.<n>Inspired by the pedagogical method of "proof by counterexamples" commonly used in human mathematics education, our work aims to enhance LLMs' ability to conduct mathematical reasoning and proof through counterexamples.
arXiv Detail & Related papers (2025-02-12T02:01:10Z)
Enriching Ontologies with Disjointness Axioms using Large Language Models [5.355177558868206]
Large Models (LLMs) offer consistency by identifying and asserting class disjointness axioms.<n>Our approach aims at leveraging the implicit knowledge embedded in LLMs to elicit knowledge for classifying ontological disjointness.<n>Our findings suggest that LLMs, when guided by effective prompt strategies, can reliably identify disjointness relationships.
arXiv Detail & Related papers (2024-10-04T09:00:06Z)
S$^3$c-Math: Spontaneous Step-level Self-correction Makes Large Language Models Better Mathematical Reasoners [23.713779973116733]
Self-correction is a method that can stimulate the potential reasoning abilities of large language models (LLMs) We propose S$3$c-Math, which are able to perform Spontaneous Step-level Self-correction for Mathematical reasoning.
arXiv Detail & Related papers (2024-09-03T01:40:21Z)
Potential and Limitations of LLMs in Capturing Structured Semantics: A Case Study on SRL [78.80673954827773]
Large Language Models (LLMs) play a crucial role in capturing structured semantics to enhance language understanding, improve interpretability, and reduce bias. We propose using Semantic Role Labeling (SRL) as a fundamental task to explore LLMs' ability to extract structured semantics. We find interesting potential: LLMs can indeed capture semantic structures, and scaling-up doesn't always mirror potential. We are surprised to discover that significant overlap in the errors is made by both LLMs and untrained humans, accounting for almost 30% of all errors.
arXiv Detail & Related papers (2024-05-10T11:44:05Z)
FAC$^2$E: Better Understanding Large Language Model Capabilities by Dissociating Language and Cognition [56.76951887823882]
Large language models (LLMs) are primarily evaluated by overall performance on various text understanding and generation tasks. We present FAC$2$E, a framework for Fine-grAined and Cognition-grounded LLMs' Capability Evaluation.
arXiv Detail & Related papers (2024-02-29T21:05:37Z)
Characterizing Truthfulness in Large Language Model Generations with Local Intrinsic Dimension [63.330262740414646]
We study how to characterize and predict the truthfulness of texts generated from large language models (LLMs) We suggest investigating internal activations and quantifying LLM's truthfulness using the local intrinsic dimension (LID) of model activations.
arXiv Detail & Related papers (2024-02-28T04:56:21Z)
Evaluating LLMs' Mathematical Reasoning in Financial Document Question Answering [53.56653281752486]
This study explores Large Language Models' mathematical reasoning on four financial question-answering datasets. We focus on sensitivity to table complexity and performance variations with an increasing number of arithmetic reasoning steps. We introduce a novel prompting technique tailored to semi-structured documents, matching or outperforming other baselines in performance.
arXiv Detail & Related papers (2024-02-17T05:10:18Z)
How Proficient Are Large Language Models in Formal Languages? An In-Depth Insight for Knowledge Base Question Answering [52.86931192259096]
Knowledge Base Question Answering (KBQA) aims to answer natural language questions based on facts in knowledge bases. Recent works leverage the capabilities of large language models (LLMs) for logical form generation to improve performance.
arXiv Detail & Related papers (2024-01-11T09:27:50Z)

This list is automatically generated from the titles and abstracts of the papers in this site.