Large Language Models' Understanding of Math: Source Criticism and
Extrapolation
- URL: http://arxiv.org/abs/2311.07618v1
- Date: Sun, 12 Nov 2023 07:52:32 GMT
- Title: Large Language Models' Understanding of Math: Source Criticism and
Extrapolation
- Authors: Roozbeh Yousefzadeh and Xuenan Cao
- Abstract summary: We evaluate the mathematical understanding of the GPT-4 model.
It is hard to find scientific evidence suggesting GPT-4 has acquired an understanding of even basic mathematical concepts.
- Score: 0.0
- License: http://creativecommons.org/licenses/by-sa/4.0/
- Abstract: It has been suggested that large language models such as GPT-4 have acquired
some form of understanding beyond the correlations among the words in text
including some understanding of mathematics as well. Here, we perform a
critical inquiry into this claim by evaluating the mathematical understanding
of the GPT-4 model. Considering that GPT-4's training set is a secret, it is
not straightforward to evaluate whether the model's correct answers are based
on a mathematical understanding or based on replication of proofs that the
model has seen before. We specifically craft mathematical questions which their
formal proofs are not readily available on the web, proofs that are more likely
not seen by the GPT-4. We see that GPT-4 is unable to solve those problems
despite their simplicity. It is hard to find scientific evidence suggesting
that GPT-4 has acquired an understanding of even basic mathematical concepts. A
straightforward way to find failure modes of GPT-4 in theorem proving is to
craft questions where their formal proofs are not available on the web. Our
finding suggests that GPT-4's ability is to reproduce, rephrase, and polish the
mathematical proofs that it has seen before, and not in grasping mathematical
concepts. We also see that GPT-4's ability to prove mathematical theorems is
continuously expanding over time despite the claim that it is a fixed model. We
suggest that the task of proving mathematical theorems in formal language is
comparable to the methods used in search engines such as Google while
predicting the next word in a sentence may be a misguided approach, a recipe
that often leads to excessive extrapolation and eventual failures. Prompting
the GPT-4 over and over may benefit the GPT-4 and the OpenAI, but we question
whether it is valuable for machine learning or for theorem proving.
Related papers
- Lean-STaR: Learning to Interleave Thinking and Proving [53.923617816215774]
We present Lean-STaR, a framework for training language models to produce informal thoughts prior to each step of a proof.
Lean-STaR achieves state-of-the-art results on the miniF2F-test benchmark within the Lean theorem proving environment.
arXiv Detail & Related papers (2024-07-14T01:43:07Z) - Autoformalizing Euclidean Geometry [74.72212706513318]
We introduce a neuro-symbolic framework for autoformalizing Euclidean geometry.
One challenge is that informal proofs rely on diagrams, leaving gaps in texts that are hard to formalize.
We provide automatic semantic evaluation for autoformalized theorem statements.
arXiv Detail & Related papers (2024-05-27T14:35:10Z) - A New Approach Towards Autoformalization [7.275550401145199]
Autoformalization is the task of translating natural language mathematics into a formal language that can be verified by a program.
Research paper mathematics requires large amounts of background and context.
We propose an avenue towards tackling autoformalization for research-level mathematics, by breaking the task into easier and more approachable subtasks.
arXiv Detail & Related papers (2023-10-12T00:50:24Z) - TheoremQA: A Theorem-driven Question Answering dataset [100.39878559382694]
GPT-4's capabilities to solve these problems are unparalleled, achieving an accuracy of 51% with Program-of-Thoughts Prompting.
TheoremQA is curated by domain experts containing 800 high-quality questions covering 350 theorems.
arXiv Detail & Related papers (2023-05-21T17:51:35Z) - Experimental results from applying GPT-4 to an unpublished formal
language [0.0]
A state-of-the-art system, GPT-4, was provided with a concise natural language specification for a previously unpublished formal system.
The system completed all tasks successfully, showed extensive domain knowledge, invented helpful new syntax and semantics, and exhibited generalization and inference abilities.
arXiv Detail & Related papers (2023-05-20T14:00:08Z) - Sparks of Artificial General Intelligence: Early experiments with GPT-4 [66.1188263570629]
GPT-4, developed by OpenAI, was trained using an unprecedented scale of compute and data.
We demonstrate that GPT-4 can solve novel and difficult tasks that span mathematics, coding, vision, medicine, law, psychology and more.
We believe GPT-4 could reasonably be viewed as an early (yet still incomplete) version of an artificial general intelligence (AGI) system.
arXiv Detail & Related papers (2023-03-22T16:51:28Z) - Mathematical Capabilities of ChatGPT [35.71603158908465]
We release two new datasets: GHOSTS and miniGHOSTS.
These are the first natural-language datasets curated by working researchers in mathematics.
We benchmark the models on a range of fine-grained performance metrics.
arXiv Detail & Related papers (2023-01-31T18:59:03Z) - NaturalProver: Grounded Mathematical Proof Generation with Language
Models [84.2064569475095]
Theorem proving in natural mathematical language plays a central role in mathematical advances and education.
We develop NaturalProver, a language model that generates proofs by conditioning on background references.
NaturalProver is capable of proving some theorems that require short (2-6 step) proofs, and providing next-step suggestions that are rated as correct and useful over 40% of the time.
arXiv Detail & Related papers (2022-05-25T17:01:18Z) - Noisy Deductive Reasoning: How Humans Construct Math, and How Math
Constructs Universes [0.5874142059884521]
We present a computational model of mathematical reasoning according to which mathematics is a fundamentally process.
We show that this framework gives a compelling account of several aspects of mathematical practice.
arXiv Detail & Related papers (2020-10-28T19:43:14Z) - Generative Language Modeling for Automated Theorem Proving [94.01137612934842]
This work is motivated by the possibility that a major limitation of automated theorem provers compared to humans might be addressable via generation from language models.
We present an automated prover and proof assistant, GPT-f, for the Metamath formalization language, and analyze its performance.
arXiv Detail & Related papers (2020-09-07T19:50:10Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.