Get an A in Math: Progressive Rectification Prompting
- URL: http://arxiv.org/abs/2312.06867v1
- Date: Mon, 11 Dec 2023 22:25:57 GMT
- Title: Get an A in Math: Progressive Rectification Prompting
- Authors: Zhenyu Wu, Meng Jiang, Chao Shen
- Abstract summary: Chain-of-Thought (CoT) prompting methods have enabled large language models (LLMs) to generate reasoning paths and solve math word problems (MWPs)
We propose a novel method named Progressive Rectification Prompting (PRP) to improve average accuracy on eight MWP datasets from 77.3 to 90.5.
- Score: 42.09762345892869
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Chain-of-Thought (CoT) prompting methods have enabled large language models
(LLMs) to generate reasoning paths and solve math word problems (MWPs).
However, they are sensitive to mistakes in the paths, as any mistake can result
in an incorrect answer. We propose a novel method named Progressive
Rectification Prompting (PRP) to improve average accuracy on eight MWP datasets
from 77.3 to 90.5. Given an initial answer from CoT, PRP iterates a
verify-then-rectify process to progressively identify incorrect answers and
rectify the reasoning paths. With the most likely correct answer, the LLM
predicts a masked numerical value in the question; if the prediction does not
match the masked value, the answer is likely incorrect. Then the LLM is
prompted to re-generate the reasoning path hinted with a set of incorrect
answers to prevent itself from repeating previous mistakes. PRP achieves the
best performance compared against the CoT methods. Our implementation is made
publicly available at https://wzy6642.github.io/prp.github.io/.
Related papers
- LLM Robustness Against Misinformation in Biomedical Question Answering [50.98256373698759]
The retrieval-augmented generation (RAG) approach is used to reduce the confabulation of large language models (LLMs) for question answering.
We evaluate the effectiveness and robustness of four LLMs against misinformation in answering biomedical questions.
arXiv Detail & Related papers (2024-10-27T16:23:26Z) - When is the consistent prediction likely to be a correct prediction? [34.41365254799998]
We show that consistent answers derived through longer reasoning texts are more likely to be correct.
This is predominantly because we demonstrate that LLMs can autonomously produce chain-of-thought (CoT) style reasoning.
We conclude that the probability of LLMs generating a longer response is quite low, highlighting the need for decoding strategies conditioned on output length.
arXiv Detail & Related papers (2024-07-08T09:37:27Z) - Large Language Models Can Self-Correct with Key Condition Verification [39.67266805233599]
We find that a simple yet effective verification method can unleash inherent capabilities of large language models.
We propose an iterative verify-then-correct framework to progressively identify and correct (probably) false responses.
arXiv Detail & Related papers (2024-05-23T01:43:45Z) - Learning From Mistakes Makes LLM Better Reasoner [106.48571828587728]
Large language models (LLMs) recently exhibited remarkable reasoning capabilities on solving math problems.
This work explores whether LLMs can LEarn from MistAkes (LEMA), akin to the human learning process.
arXiv Detail & Related papers (2023-10-31T17:52:22Z) - GRACE: Discriminator-Guided Chain-of-Thought Reasoning [75.35436025709049]
We propose Guiding chain-of-thought ReAsoning with a CorrectnEss Discriminator (GRACE) to steer the decoding process towards producing correct reasoning steps.
GRACE employs a discriminator trained with a contrastive loss over correct and incorrect steps, which is used during decoding to score next-step candidates.
arXiv Detail & Related papers (2023-05-24T09:16:51Z) - RCOT: Detecting and Rectifying Factual Inconsistency in Reasoning by
Reversing Chain-of-Thought [56.558892336235914]
Reversing Chain-of-Thought (RCoT) is a novel method to improve large language models' reasoning abilities.
RCoT automatically detects and rectifys factual inconsistency in generated solutions.
We show that manually written fine-grained feedback can dramatically improve LLMs' reasoning abilities.
arXiv Detail & Related papers (2023-05-19T08:02:52Z) - MathPrompter: Mathematical Reasoning using Large Language Models [7.953723258038284]
Large Language Models (LLMs) have limited performance when solving arithmetic reasoning tasks.
MathPrompter uses the Zero-shot chain-of-thought prompting technique to generate multiple Algebraic expressions or Python functions to solve the same math problem in different ways.
arXiv Detail & Related papers (2023-03-04T04:43:49Z) - Large Language Models are Better Reasoners with Self-Verification [48.534270563880845]
Large language models (LLMs) have shown strong reasoning ability in several natural language processing tasks.
LLMs with chain of thought (CoT) prompting require multi-step prompting and multi-token prediction, which is highly sensitive to individual mistakes.
We propose and prove that LLMs also have similar self-verification abilities.
arXiv Detail & Related papers (2022-12-19T15:51:52Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.