LeanAgent: Lifelong Learning for Formal Theorem Proving
- URL: http://arxiv.org/abs/2410.06209v7
- Date: Mon, 25 Nov 2024 02:39:03 GMT
- Title: LeanAgent: Lifelong Learning for Formal Theorem Proving
- Authors: Adarsh Kumarappan, Mo Tiwari, Peiyang Song, Robert Joseph George, Chaowei Xiao, Anima Anandkumar,
- Abstract summary: We present LeanAgent, a novel lifelong learning framework for formal theorem proving.
LeanAgent continuously generalizes to and improves on ever-expanding mathematical knowledge.
It successfully proves 155 theorems previously unproved formally by humans across 23 diverse Lean repositories.
- Score: 85.39415834798385
- License:
- Abstract: Large Language Models (LLMs) have been successful in mathematical reasoning tasks such as formal theorem proving when integrated with interactive proof assistants like Lean. Existing approaches involve training or fine-tuning an LLM on a specific dataset to perform well on particular domains, such as undergraduate-level mathematics. These methods struggle with generalizability to advanced mathematics. A fundamental limitation is that these approaches operate on static domains, failing to capture how mathematicians often work across multiple domains and projects simultaneously or cyclically. We present LeanAgent, a novel lifelong learning framework for formal theorem proving that continuously generalizes to and improves on ever-expanding mathematical knowledge without forgetting previously learned knowledge. LeanAgent introduces several key innovations, including a curriculum learning strategy that optimizes the learning trajectory in terms of mathematical difficulty, a dynamic database for efficient management of evolving mathematical knowledge, and progressive training to balance stability and plasticity. LeanAgent successfully proves 155 theorems previously unproved formally by humans across 23 diverse Lean repositories, many from advanced mathematics. It performs significantly better than the static LLM baseline, proving challenging theorems in domains like abstract algebra and algebraic topology while showcasing a clear progression of learning from basic concepts to advanced topics. In addition, we analyze LeanAgent's superior performance on key lifelong learning metrics. LeanAgent achieves exceptional scores in stability and backward transfer, where learning new tasks improves performance on previously learned tasks. This emphasizes LeanAgent's continuous generalizability and improvement, explaining its superior theorem-proving performance.
Related papers
- Mathematical Formalized Problem Solving and Theorem Proving in Different Fields in Lean 4 [0.0]
This paper explores the use of Large Language Models (LLMs) to generate formal proof steps and complete formalized proofs.
The goal is to determine how AI can be leveraged to assist the mathematical formalization process and improve its performance.
arXiv Detail & Related papers (2024-09-09T18:21:28Z) - MathLearner: A Large Language Model Agent Framework for Learning to Solve Mathematical Problems [0.936726079405677]
We propose an agent framework for learning to solve mathematical problems based on inductive reasoning.
By emulating the human learning process of generalization of learned information, this framework has great performance in the mathematical reasoning process.
Our model can be used as a personalised learning aid, thus reducing the inequality of educational resources.
arXiv Detail & Related papers (2024-08-03T13:28:19Z) - Learning Beyond Pattern Matching? Assaying Mathematical Understanding in LLMs [58.09253149867228]
This paper assesses the domain knowledge of LLMs through its understanding of different mathematical skills required to solve problems.
Motivated by the use of LLMs as a general scientific assistant, we propose textitNTKEval to assess changes in LLM's probability distribution.
Our systematic analysis finds evidence of domain understanding during in-context learning.
Certain instruction-tuning leads to similar performance changes irrespective of training on different data, suggesting a lack of domain understanding across different skills.
arXiv Detail & Related papers (2024-05-24T12:04:54Z) - MathBench: Evaluating the Theory and Application Proficiency of LLMs with a Hierarchical Mathematics Benchmark [82.64129627675123]
MathBench is a new benchmark that rigorously assesses the mathematical capabilities of large language models.
MathBench spans a wide range of mathematical disciplines, offering a detailed evaluation of both theoretical understanding and practical problem-solving skills.
arXiv Detail & Related papers (2024-05-20T17:52:29Z) - AlphaMath Almost Zero: Process Supervision without Process [6.318873143509028]
We propose an innovative framework, AlphaMath, that bypasses the need for process annotations by leveraging Monte Carlo Tree Search (MCTS)
This framework focuses on unleashing the potential of a well-pretrained LLM to autonomously enhance its mathematical reasoning.
The experimental results on both in-domain and out-of-domain datasets demonstrate that even without GPT-4 or human-annotated process supervision, our AlphaMath framework achieves comparable or superior results to previous state-of-the-art methods.
arXiv Detail & Related papers (2024-05-06T15:20:30Z) - InternLM-Math: Open Math Large Language Models Toward Verifiable Reasoning [98.53491178426492]
We open-source our math reasoning LLMs InternLM-Math which is continue pre-trained from InternLM2.
We unify chain-of-thought reasoning, reward modeling, formal reasoning, data augmentation, and code interpreter in a unified seq2seq format.
Our pre-trained model achieves 30.3 on the MiniF2F test set without fine-tuning.
arXiv Detail & Related papers (2024-02-09T11:22:08Z) - math-PVS: A Large Language Model Framework to Map Scientific
Publications to PVS Theories [10.416375584563728]
This work investigates the applicability of large language models (LLMs) in formalizing advanced mathematical concepts.
We envision an automated process, called emphmath-PVS, to extract and formalize mathematical theorems from research papers.
arXiv Detail & Related papers (2023-10-25T23:54:04Z) - A Survey of Deep Learning for Mathematical Reasoning [71.88150173381153]
We review the key tasks, datasets, and methods at the intersection of mathematical reasoning and deep learning over the past decade.
Recent advances in large-scale neural language models have opened up new benchmarks and opportunities to use deep learning for mathematical reasoning.
arXiv Detail & Related papers (2022-12-20T18:46:16Z) - JiuZhang: A Chinese Pre-trained Language Model for Mathematical Problem
Understanding [74.12405417718054]
This paper aims to advance the mathematical intelligence of machines by presenting the first Chinese mathematical pre-trained language model(PLM)
Unlike other standard NLP tasks, mathematical texts are difficult to understand, since they involve mathematical terminology, symbols and formulas in the problem statement.
We design a novel curriculum pre-training approach for improving the learning of mathematical PLMs, consisting of both basic and advanced courses.
arXiv Detail & Related papers (2022-06-13T17:03:52Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.