RCOT: Detecting and Rectifying Factual Inconsistency in Reasoning by
Reversing Chain-of-Thought
- URL: http://arxiv.org/abs/2305.11499v2
- Date: Mon, 2 Oct 2023 03:59:04 GMT
- Title: RCOT: Detecting and Rectifying Factual Inconsistency in Reasoning by
Reversing Chain-of-Thought
- Authors: Tianci Xue, Ziqi Wang, Zhenhailong Wang, Chi Han, Pengfei Yu, Heng Ji
- Abstract summary: Reversing Chain-of-Thought (RCoT) is a novel method to improve large language models' reasoning abilities.
RCoT automatically detects and rectifys factual inconsistency in generated solutions.
We show that manually written fine-grained feedback can dramatically improve LLMs' reasoning abilities.
- Score: 56.558892336235914
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Large language Models (LLMs) have achieved promising performance on
arithmetic reasoning tasks by incorporating step-by-step chain-of-thought (CoT)
prompting. However, LLMs face challenges in maintaining factual consistency
during reasoning, exhibiting tendencies to condition overlooking, question
misinterpretation, and condition hallucination over given problems. Existing
methods use coarse-grained feedback (e.g., whether the answer is correct) to
improve factual consistency. In this work, we propose RCoT (Reversing
Chain-of-Thought), a novel method to improve LLMs' reasoning abilities by
automatically detecting and rectifying factual inconsistency in LLMs, generated
solutions. To detect factual inconsistency, RCoT first asks LLMs to reconstruct
the problem based on generated solutions. Then fine-grained comparisons between
the original problem and the reconstructed problem expose the factual
inconsistency in the original solutions. To rectify the solution, RCoT
formulates detected factual inconsistency into fine-grained feedback to guide
LLMs in revising solutions. Experimental results demonstrate improvements of
RCoT over standard CoT, Self-Consistency and Self-Refine across seven
arithmetic datasets. Moreover, we find that manually written fine-grained
feedback can dramatically improve LLMs' reasoning abilities (e.g., ChatGPT
reaches 94.6% accuracy on GSM8K), encouraging the community to further explore
the fine-grained feedback generation methods.
Related papers
- Language Models are Hidden Reasoners: Unlocking Latent Reasoning Capabilities via Self-Rewarding [74.31981011985681]
Large language models (LLMs) have shown impressive capabilities, but still struggle with complex reasoning tasks requiring multiple steps.
We introduce LaTent Reasoning Optimization (LaTRO), a principled framework that formulates reasoning as sampling from a latent distribution.
We validate LaTRO through experiments on GSM8K and ARC-Challenge datasets using multiple model architectures.
arXiv Detail & Related papers (2024-11-06T22:02:30Z) - Retrieving, Rethinking and Revising: The Chain-of-Verification Can Improve Retrieval Augmented Generation [38.80878966092216]
Recent Retrieval Augmented Generation (RAG) aims to enhance Large Language Models (LLMs)
We propose the chain-of-verification (CoV-RAG) to enhance the external retrieval correctness and internal generation consistency.
arXiv Detail & Related papers (2024-10-08T08:34:54Z) - Improving LLM Reasoning through Scaling Inference Computation with Collaborative Verification [52.095460362197336]
Large language models (LLMs) struggle with consistent and accurate reasoning.
LLMs are trained primarily on correct solutions, reducing their ability to detect and learn from errors.
We propose a novel collaborative method integrating Chain-of-Thought (CoT) and Program-of-Thought (PoT) solutions for verification.
arXiv Detail & Related papers (2024-10-05T05:21:48Z) - Achieving >97% on GSM8K: Deeply Understanding the Problems Makes LLMs Better Solvers for Math Word Problems [50.76385564061713]
Chain-of-Thought (CoT) prompting has enhanced the performance of Large Language Models (LLMs) across various reasoning tasks.
CoT usually suffers from three pitfalls: semantic misunderstanding errors, calculation errors, and step-missing errors.
We propose Deeply Understanding the Problems (DUP) to improve the LLMs' math problem-solving ability by addressing semantic misunderstanding errors.
arXiv Detail & Related papers (2024-04-23T12:16:05Z) - LLMRefine: Pinpointing and Refining Large Language Models via Fine-Grained Actionable Feedback [65.84061725174269]
Recent large language models (LLM) are leveraging human feedback to improve their generation quality.
We propose LLMRefine, an inference time optimization method to refine LLM's output.
We conduct experiments on three text generation tasks, including machine translation, long-form question answering (QA), and topical summarization.
LLMRefine consistently outperforms all baseline approaches, achieving improvements up to 1.7 MetricX points on translation tasks, 8.1 ROUGE-L on ASQA, 2.2 ROUGE-L on topical summarization.
arXiv Detail & Related papers (2023-11-15T19:52:11Z) - Rephrase and Respond: Let Large Language Models Ask Better Questions for Themselves [57.974103113675795]
We present a method named Rephrase and Respond' (RaR) which allows Large Language Models to rephrase and expand questions posed by humans.
RaR serves as a simple yet effective prompting method for improving performance.
We show that RaR is complementary to the popular Chain-of-Thought (CoT) methods, both theoretically and empirically.
arXiv Detail & Related papers (2023-11-07T18:43:34Z) - Fill in the Blank: Exploring and Enhancing LLM Capabilities for Backward Reasoning in Math Word Problems [17.80128896525717]
backward reasoning is relatively unexplored.
backward reasoning can be seen as the ''inverse'' of forward reasoning.
We propose variations of three different forward reasoning strategies to improve performance.
arXiv Detail & Related papers (2023-10-03T12:03:06Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.