Related papers: Solving Math Word Problems by Combining Language Models With Symbolic Solvers

Solving Math Word Problems by Combining Language Models With Symbolic Solvers

URL: http://arxiv.org/abs/2304.09102v1
Date: Sun, 16 Apr 2023 04:16:06 GMT
Title: Solving Math Word Problems by Combining Language Models With Symbolic Solvers
Authors: Joy He-Yueya, Gabriel Poesia, Rose E. Wang, Noah D. Goodman
Abstract summary: Large language models (LLMs) can be combined with external tools to perform complex reasoning and calculation. We propose an approach that combines an LLM that can incrementally formalize word problems as a set of variables and equations with an external symbolic solver. Our approach achieves comparable accuracy to the original PAL on the GSM8K benchmark of math word problems and outperforms PAL by an absolute 20% on ALGEBRA.
Score: 28.010617102877923
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Automatically generating high-quality step-by-step solutions to math word problems has many applications in education. Recently, combining large language models (LLMs) with external tools to perform complex reasoning and calculation has emerged as a promising direction for solving math word problems, but prior approaches such as Program-Aided Language model (PAL) are biased towards simple procedural problems and less effective for problems that require declarative reasoning. We propose an approach that combines an LLM that can incrementally formalize word problems as a set of variables and equations with an external symbolic solver that can solve the equations. Our approach achieves comparable accuracy to the original PAL on the GSM8K benchmark of math word problems and outperforms PAL by an absolute 20% on ALGEBRA, a new dataset of more challenging word problems extracted from Algebra textbooks. Our work highlights the benefits of using declarative and incremental representations when interfacing with an external tool for solving complex math word problems. Our data and prompts are publicly available at https://github.com/joyheyueya/declarative-math-word-problem.

Related papers

Frontier LLMs Still Struggle with Simple Reasoning Tasks [53.497499123166804]
This work studies the performance of frontier language models on a broad set of "easy" reasoning problems.<n>We create a suite of procedurally generated simple reasoning tasks, including counting, first-order logic, proof trees, and travel planning.<n>We show that even state-of-the-art thinking models consistently fail on such problems and for similar reasons.
arXiv Detail & Related papers (2025-07-09T22:22:49Z)
MATH-Perturb: Benchmarking LLMs' Math Reasoning Abilities against Hard Perturbations [90.07275414500154]
We observe significant performance drops on MATH-P-Hard across various models. We also raise concerns about a novel form of memorization where models blindly apply learned problem-solving skills.
arXiv Detail & Related papers (2025-02-10T13:31:46Z)
MathCAMPS: Fine-grained Synthesis of Mathematical Problems From Human Curricula [33.5782208232163]
We propose Math CAMPS: a method to synthesize high-quality mathematical problems at scale. We encode each standard in a formal grammar, allowing us to sample diverse symbolic problems and their answers. We derive follow-up questions from symbolic structures and convert them into follow-up word problems.
arXiv Detail & Related papers (2024-07-01T01:56:28Z)
Do Language Models Exhibit the Same Cognitive Biases in Problem Solving as Human Learners? [140.9751389452011]
We study the biases of large language models (LLMs) in relation to those known in children when solving arithmetic word problems. We generate a novel set of word problems for each of these tests, using a neuro-symbolic approach that enables fine-grained control over the problem features.
arXiv Detail & Related papers (2024-01-31T18:48:20Z)
VerityMath: Advancing Mathematical Reasoning by Self-Verification Through Unit Consistency [33.760209585322606]
We study the performance of strong open-source LLMs on math word problems using program-based solving techniques. We propose a systematic approach by defining the units for each quantity and ensuring the consistency of these units during mathematical operations. Our approach, which incorporates unit consistency, currently slightly underperforms compared to an approach that does not.
arXiv Detail & Related papers (2023-11-13T09:06:58Z)
MathPrompter: Mathematical Reasoning using Large Language Models [7.953723258038284]
Large Language Models (LLMs) have limited performance when solving arithmetic reasoning tasks. MathPrompter uses the Zero-shot chain-of-thought prompting technique to generate multiple Algebraic expressions or Python functions to solve the same math problem in different ways.
arXiv Detail & Related papers (2023-03-04T04:43:49Z)
Highlighting Named Entities in Input for Auto-Formulation of Optimization Problems [0.0]
This paper presents an approach that converts linear programming word problems into mathematical formulations. We leverage the named entities in the input and augment the input to highlight these entities. Our approach achieves the highest accuracy among all submissions to the NL4Opt Competition, securing first place in the generation track.
arXiv Detail & Related papers (2022-12-26T16:13:57Z)
UniGeo: Unifying Geometry Logical Reasoning via Reformulating Mathematical Expression [127.68780714438103]
Two main geometry problems: calculation and proving, are usually treated as two specific tasks. We construct a large-scale Unified Geometry problem benchmark, UniGeo, which contains 4,998 calculation problems and 9,543 proving problems. We also present a unified multi-task Geometric Transformer framework, Geoformer, to tackle calculation and proving problems simultaneously.
arXiv Detail & Related papers (2022-12-06T04:37:51Z)
PAL: Program-aided Language Models [112.94785609781503]
We present Program-Aided Language models (PaL) to understand natural language problems. PaL offloads the solution step to a programmatic runtime such as a Python interpreter. We set new state-of-the-art results in all 12 benchmarks.
arXiv Detail & Related papers (2022-11-18T18:56:13Z)
JiuZhang: A Chinese Pre-trained Language Model for Mathematical Problem Understanding [74.12405417718054]
This paper aims to advance the mathematical intelligence of machines by presenting the first Chinese mathematical pre-trained language model(PLM) Unlike other standard NLP tasks, mathematical texts are difficult to understand, since they involve mathematical terminology, symbols and formulas in the problem statement. We design a novel curriculum pre-training approach for improving the learning of mathematical PLMs, consisting of both basic and advanced courses.
arXiv Detail & Related papers (2022-06-13T17:03:52Z)
SMART: A Situation Model for Algebra Story Problems via Attributed Grammar [74.1315776256292]
We introduce the concept of a emphsituation model, which originates from psychology studies to represent the mental states of humans in problem-solving. We show that the proposed model outperforms all previous neural solvers by a large margin while preserving much better interpretability.
arXiv Detail & Related papers (2020-12-27T21:03:40Z)
Reverse Operation based Data Augmentation for Solving Math Word Problems [37.26159426631031]
Recent models have reached their performance bottleneck and require more high-quality data for training. We propose a novel data augmentation method that reverses the mathematical logic of math word problems. We apply the augmented data on two SOTA math word problem solving models and compare our results with a strong data augmentation baseline.
arXiv Detail & Related papers (2020-10-04T11:59:59Z)
Machine Number Sense: A Dataset of Visual Arithmetic Problems for Abstract and Relational Reasoning [95.18337034090648]
We propose a dataset, Machine Number Sense (MNS), consisting of visual arithmetic problems automatically generated using a grammar model--And-Or Graph (AOG) These visual arithmetic problems are in the form of geometric figures. We benchmark the MNS dataset using four predominant neural network models as baselines in this visual reasoning task.
arXiv Detail & Related papers (2020-04-25T17:14:58Z)

This list is automatically generated from the titles and abstracts of the papers in this site.