Related papers: IsarStep: a Benchmark for High-level Mathematical Reasoning

IsarStep: a Benchmark for High-level Mathematical Reasoning

URL: http://arxiv.org/abs/2006.09265v2
Date: Wed, 24 Mar 2021 16:45:18 GMT
Title: IsarStep: a Benchmark for High-level Mathematical Reasoning
Authors: Wenda Li and Lei Yu and Yuhuai Wu and Lawrence C. Paulson
Abstract summary: We present a benchmark for high-level mathematical reasoning and study the reasoning capabilities of neural sequence-to-sequence models. We build a non-synthetic dataset from the largest repository of proofs written by human experts in a theorem prover.
Score: 20.96474618260995
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: A well-defined benchmark is essential for measuring and accelerating research progress of machine learning models. In this paper, we present a benchmark for high-level mathematical reasoning and study the reasoning capabilities of neural sequence-to-sequence models. We build a non-synthetic dataset from the largest repository of proofs written by human experts in a theorem prover. The dataset has a broad coverage of undergraduate and research-level mathematical and computer science theorems. In our defined task, a model is required to fill in a missing intermediate proposition given surrounding proofs. This task provides a starting point for the long-term goal of having machines generate human-readable proofs automatically. Our experiments and analysis reveal that while the task is challenging, neural models can capture non-trivial mathematical reasoning. We further design a hierarchical transformer that outperforms the transformer baseline.

Related papers

Algebraic Machine Learning: Learning as computing an algebraic decomposition of a task [41.94295877935867]
We propose an alternative foundation based on Abstract Algebra, with mathematics that facilitates the analysis of learning. In this approach, the goal of the task and the data are encoded as axioms of an algebra, and a model is obtained where only these axioms and their logical consequences hold. We validate this new learning principle on standard datasets such as MNIST, FashionMNIST, CIFAR-10, and medical images, achieving performance comparable to optimized multilayer perceptrons.
arXiv Detail & Related papers (2025-02-27T10:13:42Z)
MathFimer: Enhancing Mathematical Reasoning by Expanding Reasoning Steps through Fill-in-the-Middle Task [49.355810887265925]
We introduce MathFimer, a novel framework for mathematical reasoning step expansion. We develop a specialized model, MathFimer-7B, on our carefully curated NuminaMath-FIM dataset. We then apply these models to enhance existing mathematical reasoning datasets by inserting detailed intermediate steps into their solution chains.
arXiv Detail & Related papers (2025-02-17T11:22:24Z)
Data for Mathematical Copilots: Better Ways of Presenting Proofs for Machine Learning [85.635988711588]
We argue that enhancing the capabilities of large language models requires a paradigm shift in the design of mathematical datasets. We advocate for mathematical dataset developers to consider the concept of "motivated proof", introduced by G. P'olya in 1949, which can serve as a blueprint for datasets that offer a better proof learning signal. We provide a questionnaire designed specifically for math datasets that we urge creators to include with their datasets.
arXiv Detail & Related papers (2024-12-19T18:55:17Z)
Benchmarking Large Language Models with Integer Sequence Generation Tasks [1.3108652488669736]
This paper presents a novel benchmark where the large language model (LLM) must write code that computes integer sequences from the Online Encyclopedia of Sequences (OEIS) Our benchmark reveals that the o1 series of models outperform other frontier models from OpenAI, Anthropic, Meta, and Google in accuracy and cheating rates across both easy and hard integer sequences.
arXiv Detail & Related papers (2024-11-07T02:05:43Z)
Discovering symbolic expressions with parallelized tree search [59.92040079807524]
Symbolic regression plays a crucial role in scientific research thanks to its capability of discovering concise and interpretable mathematical expressions from data. Existing algorithms have faced a critical bottleneck of accuracy and efficiency over a decade when handling problems of complexity. We introduce a parallelized tree search (PTS) model to efficiently distill generic mathematical expressions from limited data.
arXiv Detail & Related papers (2024-07-05T10:41:15Z)
TRIGO: Benchmarking Formal Mathematical Proof Reduction for Generative Language Models [68.65075559137608]
We propose TRIGO, an ATP benchmark that not only requires a model to reduce a trigonometric expression with step-by-step proofs but also evaluates a generative LM's reasoning ability on formulas. We gather trigonometric expressions and their reduced forms from the web, annotate the simplification process manually, and translate it into the Lean formal language system. We develop an automatic generator based on Lean-Gym to create dataset splits of varying difficulties and distributions in order to thoroughly analyze the model's generalization ability.
arXiv Detail & Related papers (2023-10-16T08:42:39Z)
A Brain-Inspired Sequence Learning Model based on a Logic [6.653734987585243]
In this paper, a model of sequence learning, which is interpretable through Non-Axiomatic Logic, is designed and tested. The results show that the model works well within different levels of difficulty.
arXiv Detail & Related papers (2023-08-24T01:01:41Z)
Learning Active Subspaces and Discovering Important Features with Gaussian Radial Basis Functions Neural Networks [0.0]
We show that precious information is contained in the spectrum of the precision matrix that can be extracted once the training of the model is completed. We conducted numerical experiments for regression, classification, and feature selection tasks. Our results demonstrate that the proposed model does not only yield an attractive prediction performance compared to the competitors.
arXiv Detail & Related papers (2023-07-11T09:54:30Z)
A Survey of Deep Learning for Mathematical Reasoning [71.88150173381153]
We review the key tasks, datasets, and methods at the intersection of mathematical reasoning and deep learning over the past decade. Recent advances in large-scale neural language models have opened up new benchmarks and opportunities to use deep learning for mathematical reasoning.
arXiv Detail & Related papers (2022-12-20T18:46:16Z)
ConvFinQA: Exploring the Chain of Numerical Reasoning in Conversational Finance Question Answering [70.6359636116848]
We propose a new large-scale dataset, ConvFinQA, to study the chain of numerical reasoning in conversational question answering. Our dataset poses great challenge in modeling long-range, complex numerical reasoning paths in real-world conversations.
arXiv Detail & Related papers (2022-10-07T23:48:50Z)
SynBench: Task-Agnostic Benchmarking of Pretrained Representations using Synthetic Data [78.21197488065177]
Recent success in fine-tuning large models, that are pretrained on broad data at scale, on downstream tasks has led to a significant paradigm shift in deep learning. This paper proposes a new task-agnostic framework, textitSynBench, to measure the quality of pretrained representations using synthetic data.
arXiv Detail & Related papers (2022-10-06T15:25:00Z)
Generative Language Modeling for Automated Theorem Proving [94.01137612934842]
This work is motivated by the possibility that a major limitation of automated theorem provers compared to humans might be addressable via generation from language models. We present an automated prover and proof assistant, GPT-f, for the Metamath formalization language, and analyze its performance.
arXiv Detail & Related papers (2020-09-07T19:50:10Z)
Learning to Prove from Synthetic Theorems [41.74768503409581]
A major challenge in applying machine learning to automated theorem proving is the scarcity of training data. We propose an approach that relies on training with synthetic theorems, generated from a set of axioms. We show that such theorems can be used to train an automated prover and that the learned prover transfers successfully to human-generated theorems.
arXiv Detail & Related papers (2020-06-19T17:48:09Z)

This list is automatically generated from the titles and abstracts of the papers in this site.