Formal Mathematics Statement Curriculum Learning
- URL: http://arxiv.org/abs/2202.01344v1
- Date: Thu, 3 Feb 2022 00:17:00 GMT
- Title: Formal Mathematics Statement Curriculum Learning
- Authors: Stanislas Polu, Jesse Michael Han, Kunhao Zheng, Mantas Baksys, Igor
Babuschkin, Ilya Sutskever
- Abstract summary: We show that at same compute budget, expert iteration, by which we mean proof search interleaved with learning, dramatically outperforms proof search only.
We also observe that when applied to a collection of formal statements of sufficiently varied difficulty, expert iteration is capable of finding and solving a curriculum of increasingly difficult problems.
- Score: 64.45821687940946
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: We explore the use of expert iteration in the context of language modeling
applied to formal mathematics. We show that at same compute budget, expert
iteration, by which we mean proof search interleaved with learning,
dramatically outperforms proof search only. We also observe that when applied
to a collection of formal statements of sufficiently varied difficulty, expert
iteration is capable of finding and solving a curriculum of increasingly
difficult problems, without the need for associated ground-truth proofs.
Finally, by applying this expert iteration to a manually curated set of problem
statements, we achieve state-of-the-art on the miniF2F benchmark, automatically
solving multiple challenging problems drawn from high school olympiads.
Related papers
- MathConstruct: Challenging LLM Reasoning with Constructive Proofs [0.9320657506524149]
mc is a new benchmark of 126 challenging problems sourced from various math competitions.
mc is suitable for Large Language Models evaluation, as solution correctness can be easily verified.
arXiv Detail & Related papers (2025-02-14T14:44:22Z) - MathGAP: Out-of-Distribution Evaluation on Problems with Arbitrarily Complex Proofs [80.96119560172224]
MathGAP generates problem statements and chain-of-thought reasoning traces according to specifications about their arithmetic proof structure.
Using MathGAP, we find that LLMs show a significant decrease in performance as proofs get deeper and wider.
arXiv Detail & Related papers (2024-10-17T12:48:14Z) - Artifical intelligence and inherent mathematical difficulty [0.0]
We first present an updated version of a traditional argument that limitative results from computability and complexity theory show that proof discovery is an inherently difficult problem.
We then illustrate how several recent applications of artificial intelligence-inspired methods do indeed raise novel questions about the nature of mathematical proof.
arXiv Detail & Related papers (2024-08-01T20:08:31Z) - One Prompt is not Enough: Automated Construction of a Mixture-of-Expert Prompts [110.94724216491753]
Large Language Models (LLMs) exhibit strong generalization capabilities when prompted with language instructions and in-context demos.
Various methods have been explored to automate the instruction design, but they restricted the searched prompt to one instruction.
We adopt the Mixture-of-Expert paradigm and divide the problem space into a set of sub-regions.
A two-phase process is developed to construct the specialized expert for each region.
A region-based joint search of an instruction per expert complements the demos assigned to it, yielding a synergistic effect.
arXiv Detail & Related papers (2024-06-28T23:05:08Z) - Knowledge Tagging System on Math Questions via LLMs with Flexible Demonstration Retriever [48.5585921817745]
Large Language Models (LLMs) are used to automate the knowledge tagging task.
We show the strong performance of zero- and few-shot results over math questions knowledge tagging tasks.
By proposing a reinforcement learning-based demonstration retriever, we successfully exploit the great potential of different-sized LLMs.
arXiv Detail & Related papers (2024-06-19T23:30:01Z) - VC Search: Bridging the Gap Between Well-Defined and Ill-Defined Problems in Mathematical Reasoning [46.25056744404318]
We develop a benchmark called Problems with Missing and Contradictory conditions ( PMC) containing over 5,000 validated ill-defined mathematical problems.
VCSEARCH improves the accuracy of identifying unsolvable problems by at least 12% across different large language models.
arXiv Detail & Related papers (2024-06-07T16:24:12Z) - Context Matters: Data-Efficient Augmentation of Large Language Models
for Scientific Applications [15.893290942177112]
We explore the challenges inherent to Large Language Models (LLMs) like GPT-4.
The capacity of LLMs to present erroneous answers in a coherent and semantically rigorous manner complicates the detection of factual inaccuracies.
Our work aims to enhance the understanding and mitigation of such errors, thereby contributing to the improvement of LLM accuracy and reliability.
arXiv Detail & Related papers (2023-12-12T08:43:20Z) - Fusing Models with Complementary Expertise [42.099743709292866]
We consider the Fusion of Experts (FoE) problem of fusing outputs of expert models with complementary knowledge of the data distribution.
Our method is applicable to both discriminative and generative tasks.
We extend our method to the "frugal" setting where it is desired to reduce the number of expert model evaluations at test time.
arXiv Detail & Related papers (2023-10-02T18:31:35Z) - Towards a Holistic Understanding of Mathematical Questions with
Contrastive Pre-training [65.10741459705739]
We propose a novel contrastive pre-training approach for mathematical question representations, namely QuesCo.
We first design two-level question augmentations, including content-level and structure-level, which generate literally diverse question pairs with similar purposes.
Then, to fully exploit hierarchical information of knowledge concepts, we propose a knowledge hierarchy-aware rank strategy.
arXiv Detail & Related papers (2023-01-18T14:23:29Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.