Related papers: Limits of an AI program for solving college math problems

Limits of an AI program for solving college math problems

URL: http://arxiv.org/abs/2208.06906v1
Date: Sun, 14 Aug 2022 20:10:14 GMT
Title: Limits of an AI program for solving college math problems
Authors: Ernest Davis
Abstract summary: A neural network solves, explains, and generates university math problems by program synthesis and few-shot learning at human level. The system they describe is indeed impressive; however, the above description is very much overstated. The work of solving the problems is done, not by a neural network, but by the symbolic algebra package Sympy.
Score: 0.0
License: http://creativecommons.org/licenses/by/4.0/
Abstract: Drori et al. (2022) report that "A neural network solves, explains, and generates university math problems by program synthesis and few-shot learning at human level ... [It] automatically answers 81\% of university-level mathematics problems." The system they describe is indeed impressive; however, the above description is very much overstated. The work of solving the problems is done, not by a neural network, but by the symbolic algebra package Sympy. Problems of various formats are excluded from consideration. The so-called "explanations" are just rewordings of lines of code. Answers are marked as correct that are not in the form specified in the problem. Most seriously, it seems that in many cases the system uses the correct answer given in the test corpus to guide its path to solving the problem.

Related papers

FrontierMath: A Benchmark for Evaluating Advanced Mathematical Reasoning in AI [2.0608396919601493]
FrontierMath is a benchmark of hundreds of original, exceptionally challenging mathematics problems crafted and vetted by expert mathematicians. Current state-of-the-art AI models solve under 2% of problems, revealing a vast gap between AI capabilities and the prowess of the mathematical community. As AI systems advance toward expert-level mathematical abilities, FrontierMath offers a rigorous testbed that quantifies their progress.
arXiv Detail & Related papers (2024-11-07T17:07:35Z)
Explaining Math Word Problem Solvers [2.792030485253753]
We investigate what information math word problem solvers use to generate solutions. Our results show that the model is not sensitive to the removal of many words from the input and can still find a correct answer when given a nonsense question. This indicates that automatic solvers do not follow the semantic logic of math word problems, and may be overfitting to the presence of specific words.
arXiv Detail & Related papers (2023-07-24T21:05:47Z)
Machine Learning Meets The Herbrand Universe [1.5984927623688914]
Herbrand's theorem allows reduction of first-order problems to propositional problems by instantiation. We develop the first machine learning system targeting this task. We show that the trained system achieves high accuracy in predicting the right instances.
arXiv Detail & Related papers (2022-10-07T14:46:32Z)
JiuZhang: A Chinese Pre-trained Language Model for Mathematical Problem Understanding [74.12405417718054]
This paper aims to advance the mathematical intelligence of machines by presenting the first Chinese mathematical pre-trained language model(PLM) Unlike other standard NLP tasks, mathematical texts are difficult to understand, since they involve mathematical terminology, symbols and formulas in the problem statement. We design a novel curriculum pre-training approach for improving the learning of mathematical PLMs, consisting of both basic and advanced courses.
arXiv Detail & Related papers (2022-06-13T17:03:52Z)
Least-to-Most Prompting Enables Complex Reasoning in Large Language Models [52.59923418570378]
We propose a novel prompting strategy, least-to-most prompting, to overcome the challenge of easy-to-hard generalization. We show that least-to-most prompting is capable of generalizing to more difficult problems than those seen in prompts. neural-symbolic models in the literature that specialize in solving SCAN are trained on the entire training set containing over 15,000 examples.
arXiv Detail & Related papers (2022-05-21T15:34:53Z)
End-to-end Algorithm Synthesis with Recurrent Networks: Logical Extrapolation Without Overthinking [52.05847268235338]
We show how machine learning systems can perform logical extrapolation without overthinking problems. We propose a recall architecture that keeps an explicit copy of the problem instance in memory so that it cannot be forgotten. We also employ a progressive training routine that prevents the model from learning behaviors that are specific to number and instead pushes it to learn behaviors that can be repeated indefinitely.
arXiv Detail & Related papers (2022-02-11T18:43:28Z)
A Neural Network Solves and Generates Mathematics Problems by Program Synthesis: Calculus, Differential Equations, Linear Algebra, and More [8.437319139670116]
We turn questions into programming tasks, automatically generate programs, and then execute them. This is the first work to automatically solve, grade, and generate university-level Mathematics course questions at scale.
arXiv Detail & Related papers (2021-12-31T18:57:31Z)
Solving Linear Algebra by Program Synthesis [1.0660480034605238]
We solve MIT's Linear Algebra 18.06 course and Columbia University's Computational Linear Algebra COMS3251 courses with perfect accuracy by interactive program synthesis. This surprisingly strong result is achieved by turning the course questions into programming tasks and then running the programs to produce the correct answers.
arXiv Detail & Related papers (2021-11-16T01:16:43Z)
GeoQA: A Geometric Question Answering Benchmark Towards Multimodal Numerical Reasoning [172.36214872466707]
We focus on solving geometric problems, which requires a comprehensive understanding of textual descriptions, visual diagrams, and theorem knowledge. We propose a Geometric Question Answering dataset GeoQA, containing 5,010 geometric problems with corresponding annotated programs.
arXiv Detail & Related papers (2021-05-30T12:34:17Z)
SMART: A Situation Model for Algebra Story Problems via Attributed Grammar [74.1315776256292]
We introduce the concept of a emphsituation model, which originates from psychology studies to represent the mental states of humans in problem-solving. We show that the proposed model outperforms all previous neural solvers by a large margin while preserving much better interpretability.
arXiv Detail & Related papers (2020-12-27T21:03:40Z)
Machine Number Sense: A Dataset of Visual Arithmetic Problems for Abstract and Relational Reasoning [95.18337034090648]
We propose a dataset, Machine Number Sense (MNS), consisting of visual arithmetic problems automatically generated using a grammar model--And-Or Graph (AOG) These visual arithmetic problems are in the form of geometric figures. We benchmark the MNS dataset using four predominant neural network models as baselines in this visual reasoning task.
arXiv Detail & Related papers (2020-04-25T17:14:58Z)

This list is automatically generated from the titles and abstracts of the papers in this site.