MathPrompter: Mathematical Reasoning using Large Language Models
- URL: http://arxiv.org/abs/2303.05398v1
- Date: Sat, 4 Mar 2023 04:43:49 GMT
- Title: MathPrompter: Mathematical Reasoning using Large Language Models
- Authors: Shima Imani, Liang Du, Harsh Shrivastava
- Abstract summary: Large Language Models (LLMs) have limited performance when solving arithmetic reasoning tasks.
MathPrompter uses the Zero-shot chain-of-thought prompting technique to generate multiple Algebraic expressions or Python functions to solve the same math problem in different ways.
- Score: 7.953723258038284
- License: http://creativecommons.org/licenses/by-nc-nd/4.0/
- Abstract: Large Language Models (LLMs) have limited performance when solving arithmetic
reasoning tasks and often provide incorrect answers. Unlike natural language
understanding, math problems typically have a single correct answer, making the
task of generating accurate solutions more challenging for LLMs. To the best of
our knowledge, we are not aware of any LLMs that indicate their level of
confidence in their responses which fuels a trust deficit in these models
impeding their adoption. To address this deficiency, we propose `MathPrompter',
a technique that improves performance of LLMs on arithmetic problems along with
increased reliance in the predictions. MathPrompter uses the Zero-shot
chain-of-thought prompting technique to generate multiple Algebraic expressions
or Python functions to solve the same math problem in different ways and
thereby raise the confidence level in the output results. This is in contrast
to other prompt based CoT methods, where there is no check on the validity of
the intermediate steps followed. Our technique improves over state-of-the-art
on the MultiArith dataset ($78.7\%\rightarrow92.5\%$) evaluated using 175B
parameter GPT-based LLM.
Related papers
- Improving LLM Reasoning through Scaling Inference Computation with Collaborative Verification [52.095460362197336]
Large language models (LLMs) struggle with consistent and accurate reasoning.
LLMs are trained primarily on correct solutions, reducing their ability to detect and learn from errors.
We propose a novel collaborative method integrating Chain-of-Thought (CoT) and Program-of-Thought (PoT) solutions for verification.
arXiv Detail & Related papers (2024-10-05T05:21:48Z) - BEATS: Optimizing LLM Mathematical Capabilities with BackVerify and Adaptive Disambiguate based Efficient Tree Search [22.672130194493793]
Large Language Models (LLMs) have exhibited exceptional performance across a broad range of tasks and domains.
They still encounter difficulties in solving mathematical problems due to the rigorous and logical nature of mathematics.
We propose a novel approach, BEATS, to enhance mathematical problem-solving abilities.
arXiv Detail & Related papers (2024-09-26T15:47:42Z) - AI-Assisted Generation of Difficult Math Questions [78.7547836422727]
Current training positions mathematical reasoning as a core capability.
There is unmet demand for diverse and challenging math questions.
We present a design framework that combines the strengths of LLMs with a human-in-the-loop approach.
arXiv Detail & Related papers (2024-07-30T17:55:36Z) - MathDivide: Improved mathematical reasoning by large language models [0.0]
We propose a prompting technique called MathDivide that breaks down the mathematical problem into simpler subproblems.
The results demonstrate that MathDivide was able to significantly outperform the leading prompting technique called Math-prompter.
arXiv Detail & Related papers (2024-05-12T20:21:15Z) - Achieving >97% on GSM8K: Deeply Understanding the Problems Makes LLMs Better Solvers for Math Word Problems [50.76385564061713]
Chain-of-Thought (CoT) prompting has enhanced the performance of Large Language Models (LLMs) across various reasoning tasks.
CoT usually suffers from three pitfalls: semantic misunderstanding errors, calculation errors, and step-missing errors.
We propose Deeply Understanding the Problems (DUP) to improve the LLMs' math problem-solving ability by addressing semantic misunderstanding errors.
arXiv Detail & Related papers (2024-04-23T12:16:05Z) - GSM-Plus: A Comprehensive Benchmark for Evaluating the Robustness of LLMs as Mathematical Problem Solvers [68.77382332826167]
Large language models (LLMs) have achieved impressive performance across various mathematical reasoning benchmarks.
One essential and frequently occurring evidence is that when the math questions are slightly changed, LLMs can behave incorrectly.
This motivates us to evaluate the robustness of LLMs' math reasoning capability by testing a wide range of question variations.
arXiv Detail & Related papers (2024-02-29T15:26:14Z) - SatLM: Satisfiability-Aided Language Models Using Declarative Prompting [68.40726892904286]
We propose a new satisfiability-aided language modeling (SatLM) approach for improving the reasoning capabilities of large language models (LLMs)
We use an LLM to generate a declarative task specification rather than an imperative program and leverage an off-the-shelf automated theorem prover to derive the final answer.
We evaluate SATLM on 8 different datasets and show that it consistently outperforms program-aided LMs in the imperative paradigm.
arXiv Detail & Related papers (2023-05-16T17:55:51Z) - PAL: Program-aided Language Models [112.94785609781503]
We present Program-Aided Language models (PaL) to understand natural language problems.
PaL offloads the solution step to a programmatic runtime such as a Python interpreter.
We set new state-of-the-art results in all 12 benchmarks.
arXiv Detail & Related papers (2022-11-18T18:56:13Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.