Related papers: Herald: A Natural Language Annotated Lean 4 Dataset

Herald: A Natural Language Annotated Lean 4 Dataset

URL: http://arxiv.org/abs/2410.10878v1
Date: Wed, 09 Oct 2024 10:11:24 GMT
Title: Herald: A Natural Language Annotated Lean 4 Dataset
Authors: Guoxiong Gao, Yutong Wang, Jiedong Jiang, Qi Gao, Zihan Qin, Tianyi Xu, Bin Dong,
Abstract summary: This paper introduces a novel framework for translating the Mathlib4 corpus (a unified library of mathematics in formal language Lean 4) into natural language. We present the results of this pipeline on Mathlib4 as Herald (Hierarchy and Retrieval-based Translated Lean) We also propose the Herald Translator, which is fine-tuned on Herald.
Score: 15.42247133378869
License: http://creativecommons.org/licenses/by/4.0/
Abstract: Verifiable formal languages like Lean have profoundly impacted mathematical reasoning, particularly through the use of large language models (LLMs) for automated reasoning. A significant challenge in training LLMs for these formal languages is the lack of parallel datasets that align natural language with formal language proofs. To address this challenge, this paper introduces a novel framework for translating the Mathlib4 corpus (a unified library of mathematics in formal language Lean 4) into natural language. Building upon this, we employ a dual augmentation strategy that combines tactic-based and informal-based approaches, leveraging the Lean-jixia system, a Lean 4 analyzer. We present the results of this pipeline on Mathlib4 as Herald (Hierarchy and Retrieval-based Translated Lean Dataset). We also propose the Herald Translator, which is fine-tuned on Herald. Herald translator achieves a 93.2% accuracy (Pass@128) on formalizing statements in the miniF2F-test and a 22.5% accuracy on our internal graduate-level textbook dataset, outperforming InternLM2-Math-Plus-7B (74.0% and 7.5%) and TheoremLlama (50.1% and 4.0%). Furthermore, we propose a section-level translation framework for real-world applications. As a direct application of Herald translator, we have successfully translated a template section in the Stack project, marking a notable progress in the automatic formalization of graduate-level mathematical literature. Our model, along with the datasets, will be open-sourced to the public soon.

Related papers

KELPS: A Framework for Verified Multi-Language Autoformalization via Semantic-Syntactic Alignment [5.295540405828356]
KELPS is an iterative framework for translating, synthesizing, and filtering informal data into formal languages.<n>First, we translate natural language into Knowledge Equations (KEs), a novel language that we designed, theoretically grounded in assertional logic.<n>Next, we convert them to target languages through rigorously defined rules that preserve both syntactic structure and semantic meaning.<n>This process yielded a parallel corpus of over 60,000 problems.
arXiv Detail & Related papers (2025-07-11T15:05:06Z)
REAL-Prover: Retrieval Augmented Lean Prover for Mathematical Reasoning [12.343823629674368]
We present REAL-Prover, a new open-source stepwise theorem prover for Lean 4.<n>Our prover notably boosts performance on solving college-level mathematics problems.<n>In experiments, our prover using only supervised fine-tune theorem achieves competitive results with a 23.7% success rate.
arXiv Detail & Related papers (2025-05-27T01:26:11Z)
Making LLMs Reason? The Intermediate Language Problem in Neurosymbolic Approaches [49.567092222782435]
We introduce the intermediate language problem, which is the problem of choosing a suitable formal language representation for neurosymbolic approaches. We show a maximum difference in overall-accuracy of 53.20% and 49.26% in execution-accuracy. When using the GPT4o-mini LLM we beat the state-of-the-art in overall-accuracy on the ProntoQA dataset by 21.20% and by 50.50% on the ProofWriter dataset.
arXiv Detail & Related papers (2025-02-24T14:49:52Z)
From Informal to Formal -- Incorporating and Evaluating LLMs on Natural Language Requirements to Verifiable Formal Proofs [25.69931278771869]
This paper focuses on formal verification, an immediate application scenario of formal reasoning. We constructed 18k high-quality instruction-response pairs across five formal specification languages. Fine-tuning with formal data also enhances mathematics, reasoning, and coding capabilities.
arXiv Detail & Related papers (2025-01-27T17:00:56Z)
LEAN-GitHub: Compiling GitHub LEAN repositories for a versatile LEAN prover [56.34998574538897]
We propose LEAN-GitHub, a dataset consisting of large-scale formal data extracted from Lean 4 repositories on GitHub. Our model achieved accuracies of 48.8% with a single pass and 54.5% with 64 passes on the Lean 4 miniF2F test, surpassing state-of-the-art method at 52%.
arXiv Detail & Related papers (2024-07-24T12:28:03Z)
TheoremLlama: Transforming General-Purpose LLMs into Lean4 Experts [26.98890165420689]
TheoremLlama is an end-to-end framework that trains a general-purpose Lean4 expert. Our framework achieves cumulative accuracies of 36.48% and 33.61% on MiniF2F-Valid and Test datasets respectively.
arXiv Detail & Related papers (2024-07-03T15:36:18Z)
Lean Workbook: A large-scale Lean problem set formalized from natural language math problems [50.22847430754973]
Large language models are not good at math theorem proving using formal languages like Lean. A significant challenge in this area is the scarcity of training data available in these formal languages. We propose a novel pipeline that iteratively generates and filters synthetic data to translate natural language mathematical problems into Lean 4 statements.
arXiv Detail & Related papers (2024-06-06T08:25:43Z)
Process-Driven Autoformalization in Lean 4 [30.056591518828554]
We develop a benchmark to evaluate the autoformalization capabilities of large language models. We also introduce a model that leverages the precise feedback from Lean 4 compilers to enhance autoformalization. Our experiments demonstrate that the PSV method improves autoformalization, enabling higher accuracy using less filtered training data.
arXiv Detail & Related papers (2024-06-04T03:48:08Z)
DeepSeek-Prover: Advancing Theorem Proving in LLMs through Large-Scale Synthetic Data [65.5290035371111]
We introduce an approach to generate extensive Lean 4 proof data derived from high-school and undergraduate-level mathematical competition problems. We fine-tune the DeepSeekMath 7B model on this synthetic dataset, which comprises 8 million formal statements with proofs. Our model successfully proved 5 out of 148 problems in the Lean 4 Formalized International Mathematical Olympiad (FIMO) benchmark, while GPT-4 failed to prove any.
arXiv Detail & Related papers (2024-05-23T09:03:42Z)
Zero-Shot Cross-Lingual Reranking with Large Language Models for Low-Resource Languages [51.301942056881146]
We investigate how large language models (LLMs) function as rerankers in cross-lingual information retrieval systems for African languages. Our implementation covers English and four African languages (Hausa, Somali, Swahili, and Yoruba) We examine cross-lingual reranking with queries in English and passages in the African languages.
arXiv Detail & Related papers (2023-12-26T18:38:54Z)
Cross-Lingual NER for Financial Transaction Data in Low-Resource Languages [70.25418443146435]
We propose an efficient modeling framework for cross-lingual named entity recognition in semi-structured text data. We employ two independent datasets of SMSs in English and Arabic, each carrying semi-structured banking transaction information. With access to only 30 labeled samples, our model can generalize the recognition of merchants, amounts, and other fields from English to Arabic.
arXiv Detail & Related papers (2023-07-16T00:45:42Z)
Multilingual Machine Translation with Large Language Models: Empirical Results and Analysis [103.89753784762445]
Large language models (LLMs) have demonstrated remarkable potential in handling multilingual machine translation (MMT) This paper systematically investigates the advantages and challenges of LLMs for MMT. We thoroughly evaluate eight popular LLMs, including ChatGPT and GPT-4.
arXiv Detail & Related papers (2023-04-10T15:51:30Z)
A Commonsense-Infused Language-Agnostic Learning Framework for Enhancing Prediction of Political Polarity in Multilingual News Headlines [0.0]
We use the method of translation and retrieval to acquire the inferential knowledge in the target language. We then employ an attention mechanism to emphasise important inferences. We present a dataset of over 62.6K multilingual news headlines in five European languages annotated with their respective political polarities.
arXiv Detail & Related papers (2022-12-01T06:07:01Z)
Tencent AI Lab - Shanghai Jiao Tong University Low-Resource Translation System for the WMT22 Translation Task [49.916963624249355]
This paper describes Tencent AI Lab - Shanghai Jiao Tong University (TAL-SJTU) Low-Resource Translation systems for the WMT22 shared task. We participate in the general translation task on English$Leftrightarrow$Livonian. Our system is based on M2M100 with novel techniques that adapt it to the target language pair.
arXiv Detail & Related papers (2022-10-17T04:34:09Z)

This list is automatically generated from the titles and abstracts of the papers in this site.