Large Language Models are Better Reasoners with Self-Verification
- URL: http://arxiv.org/abs/2212.09561v5
- Date: Thu, 19 Oct 2023 12:24:42 GMT
- Title: Large Language Models are Better Reasoners with Self-Verification
- Authors: Yixuan Weng, Minjun Zhu, Fei Xia, Bin Li, Shizhu He, Shengping Liu,
Bin Sun, Kang Liu, Jun Zhao
- Abstract summary: Large language models (LLMs) have shown strong reasoning ability in several natural language processing tasks.
LLMs with chain of thought (CoT) prompting require multi-step prompting and multi-token prediction, which is highly sensitive to individual mistakes.
We propose and prove that LLMs also have similar self-verification abilities.
- Score: 48.534270563880845
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Recently, with the chain of thought (CoT) prompting, large language models
(LLMs), e.g., GPT-3, have shown strong reasoning ability in several natural
language processing tasks such as arithmetic, commonsense, and logical
reasoning. However, LLMs with CoT require multi-step prompting and multi-token
prediction, which is highly sensitive to individual mistakes and vulnerable to
error accumulation. The above issues make the LLMs need the ability to verify
the answers. In fact, after inferring conclusions in some thinking decision
tasks, people often check them by re-verifying steps to avoid some mistakes. In
this paper, we propose and prove that LLMs also have similar self-verification
abilities. We take the conclusion obtained by CoT as one of the conditions for
solving the original problem. By performing a backward verification of the
answers that LLM deduced for itself, we can obtain interpretable answer
validation scores to select the candidate answer with the highest score.
Experimental results demonstrate that the proposed method can improve the
reasoning performance on various arithmetic, commonsense, and logical reasoning
datasets. Our code is publicly available at:
https://github.com/WENGSYX/Self-Verification.
Related papers
- Improving LLM Reasoning through Scaling Inference Computation with Collaborative Verification [52.095460362197336]
Large language models (LLMs) struggle with consistent and accurate reasoning.
LLMs are trained primarily on correct solutions, reducing their ability to detect and learn from errors.
We propose a novel collaborative method integrating Chain-of-Thought (CoT) and Program-of-Thought (PoT) solutions for verification.
arXiv Detail & Related papers (2024-10-05T05:21:48Z) - Order Matters in Hallucination: Reasoning Order as Benchmark and Reflexive Prompting for Large-Language-Models [0.0]
Large language models (LLMs) have generated significant attention since their inception, finding applications across various academic and industrial domains.
LLMs often suffer from the "hallucination problem", where outputs, though grammatically and logically coherent, lack factual accuracy or are entirely fabricated.
arXiv Detail & Related papers (2024-08-09T14:34:32Z) - Large Language Models Can Self-Correct with Key Condition Verification [39.67266805233599]
We find that a simple yet effective verification method can unleash inherent capabilities of large language models.
We propose an iterative verify-then-correct framework to progressively identify and correct (probably) false responses.
arXiv Detail & Related papers (2024-05-23T01:43:45Z) - LogicBench: Towards Systematic Evaluation of Logical Reasoning Ability of Large Language Models [52.03659714625452]
Recently developed large language models (LLMs) have been shown to perform remarkably well on a wide range of language understanding tasks.
But, can they really "reason" over the natural language?
This question has been receiving significant research attention and many reasoning skills such as commonsense, numerical, and qualitative have been studied.
arXiv Detail & Related papers (2024-04-23T21:08:49Z) - CLadder: Assessing Causal Reasoning in Language Models [82.8719238178569]
We investigate whether large language models (LLMs) can coherently reason about causality.
We propose a new NLP task, causal inference in natural language, inspired by the "causal inference engine" postulated by Judea Pearl et al.
arXiv Detail & Related papers (2023-12-07T15:12:12Z) - Zero-Shot Question Answering over Financial Documents using Large
Language Models [0.18749305679160366]
We introduce a large language model (LLM) based approach to answer complex questions requiring multi-hop numerical reasoning over financial reports.
We use novel zero-shot prompts that guide the LLM to encode the required reasoning into a Python program or a domain specific language.
arXiv Detail & Related papers (2023-11-19T16:23:34Z) - Forward-Backward Reasoning in Large Language Models for Mathematical Verification [65.9495774606273]
Self-Consistency samples diverse reasoning chains with answers and chooses the final answer by majority voting.
We introduce backward reasoning to verify candidate answers.
FOrward and BAckward Reasoning for verification achieves state-of-the-art performance.
arXiv Detail & Related papers (2023-08-15T13:19:59Z) - SelfCheck: Using LLMs to Zero-Shot Check Their Own Step-by-Step
Reasoning [55.76083560152823]
SelfCheck is a general-purpose zero-shot verification schema for recognizing errors in step-by-step reasoning.
We test SelfCheck on three datasets (GSM8K, MathQA, and MATH) and find that it successfully recognizes errors and, in turn, increases final answer accuracies.
arXiv Detail & Related papers (2023-08-01T10:31:36Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.