Related papers: Formal Mathematics Statement Curriculum Learning

Formal Mathematics Statement Curriculum Learning

URL: http://arxiv.org/abs/2202.01344v1
Date: Thu, 3 Feb 2022 00:17:00 GMT
Title: Formal Mathematics Statement Curriculum Learning
Authors: Stanislas Polu, Jesse Michael Han, Kunhao Zheng, Mantas Baksys, Igor Babuschkin, Ilya Sutskever
Abstract summary: We show that at same compute budget, expert iteration, by which we mean proof search interleaved with learning, dramatically outperforms proof search only. We also observe that when applied to a collection of formal statements of sufficiently varied difficulty, expert iteration is capable of finding and solving a curriculum of increasingly difficult problems.
Score: 64.45821687940946
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: We explore the use of expert iteration in the context of language modeling applied to formal mathematics. We show that at same compute budget, expert iteration, by which we mean proof search interleaved with learning, dramatically outperforms proof search only. We also observe that when applied to a collection of formal statements of sufficiently varied difficulty, expert iteration is capable of finding and solving a curriculum of increasingly difficult problems, without the need for associated ground-truth proofs. Finally, by applying this expert iteration to a manually curated set of problem statements, we achieve state-of-the-art on the miniF2F benchmark, automatically solving multiple challenging problems drawn from high school olympiads.

Related papers

Challenging the Boundaries of Reasoning: An Olympiad-Level Math Benchmark for Large Language Models [86.45058529521258]
OlymMATH is a novel Olympiad-level mathematical benchmark designed to rigorously test the complex reasoning capabilities of LLMs. OlymMATH features 200 meticulously curated problems, each manually verified and available in parallel English and Chinese versions.
arXiv Detail & Related papers (2025-03-27T11:20:17Z)
MathConstruct: Challenging LLM Reasoning with Constructive Proofs [0.9320657506524149]
mc is a new benchmark of 126 challenging problems sourced from various math competitions. mc is suitable for Large Language Models evaluation, as solution correctness can be easily verified.
arXiv Detail & Related papers (2025-02-14T14:44:22Z)
MathGAP: Out-of-Distribution Evaluation on Problems with Arbitrarily Complex Proofs [80.96119560172224]
Large language models (LLMs) can solve arithmetic word problems with high accuracy, but little is known about how well they generalize to problems that are more complex than the ones on which they have been trained. We present a framework for evaluating LLMs on problems with arbitrarily complex arithmetic proofs, called MathGAP.
arXiv Detail & Related papers (2024-10-17T12:48:14Z)
Artifical intelligence and inherent mathematical difficulty [0.0]
We first present an updated version of a traditional argument that limitative results from computability and complexity theory show that proof discovery is an inherently difficult problem. We then illustrate how several recent applications of artificial intelligence-inspired methods do indeed raise novel questions about the nature of mathematical proof.
arXiv Detail & Related papers (2024-08-01T20:08:31Z)
One Prompt is not Enough: Automated Construction of a Mixture-of-Expert Prompts [110.94724216491753]
Large Language Models (LLMs) exhibit strong generalization capabilities when prompted with language instructions and in-context demos. Various methods have been explored to automate the instruction design, but they restricted the searched prompt to one instruction. We adopt the Mixture-of-Expert paradigm and divide the problem space into a set of sub-regions. A two-phase process is developed to construct the specialized expert for each region. A region-based joint search of an instruction per expert complements the demos assigned to it, yielding a synergistic effect.
arXiv Detail & Related papers (2024-06-28T23:05:08Z)
MathOdyssey: Benchmarking Mathematical Problem-Solving Skills in Large Language Models Using Odyssey Math Data [20.31528845718877]
Large language models (LLMs) have significantly advanced natural language understanding and demonstrated strong problem-solving abilities. This paper investigates the mathematical problem-solving capabilities of LLMs using the newly developed "MathOdyssey" dataset.
arXiv Detail & Related papers (2024-06-26T13:02:35Z)
Knowledge Tagging System on Math Questions via LLMs with Flexible Demonstration Retriever [48.5585921817745]
Large Language Models (LLMs) are used to automate the knowledge tagging task. We show the strong performance of zero- and few-shot results over math questions knowledge tagging tasks. By proposing a reinforcement learning-based demonstration retriever, we successfully exploit the great potential of different-sized LLMs.
arXiv Detail & Related papers (2024-06-19T23:30:01Z)
VC Search: Bridging the Gap Between Well-Defined and Ill-Defined Problems in Mathematical Reasoning [46.25056744404318]
We develop a benchmark called Problems with Missing and Contradictory conditions ( PMC) containing over 5,000 validated ill-defined mathematical problems. VCSEARCH improves the accuracy of identifying unsolvable problems by at least 12% across different large language models.
arXiv Detail & Related papers (2024-06-07T16:24:12Z)
Context Matters: Data-Efficient Augmentation of Large Language Models for Scientific Applications [15.893290942177112]
We explore the challenges inherent to Large Language Models (LLMs) like GPT-4. The capacity of LLMs to present erroneous answers in a coherent and semantically rigorous manner complicates the detection of factual inaccuracies. Our work aims to enhance the understanding and mitigation of such errors, thereby contributing to the improvement of LLM accuracy and reliability.
arXiv Detail & Related papers (2023-12-12T08:43:20Z)
Fusing Models with Complementary Expertise [42.099743709292866]
We consider the Fusion of Experts (FoE) problem of fusing outputs of expert models with complementary knowledge of the data distribution. Our method is applicable to both discriminative and generative tasks. We extend our method to the "frugal" setting where it is desired to reduce the number of expert model evaluations at test time.
arXiv Detail & Related papers (2023-10-02T18:31:35Z)
A Hybrid System for Systematic Generalization in Simple Arithmetic Problems [70.91780996370326]
We propose a hybrid system capable of solving arithmetic problems that require compositional and systematic reasoning over sequences of symbols. We show that the proposed system can accurately solve nested arithmetical expressions even when trained only on a subset including the simplest cases.
arXiv Detail & Related papers (2023-06-29T18:35:41Z)
Towards a Holistic Understanding of Mathematical Questions with Contrastive Pre-training [65.10741459705739]
We propose a novel contrastive pre-training approach for mathematical question representations, namely QuesCo. We first design two-level question augmentations, including content-level and structure-level, which generate literally diverse question pairs with similar purposes. Then, to fully exploit hierarchical information of knowledge concepts, we propose a knowledge hierarchy-aware rank strategy.
arXiv Detail & Related papers (2023-01-18T14:23:29Z)

This list is automatically generated from the titles and abstracts of the papers in this site.