Faithful Chain-of-Thought Reasoning
- URL: http://arxiv.org/abs/2301.13379v3
- Date: Wed, 20 Sep 2023 22:19:30 GMT
- Title: Faithful Chain-of-Thought Reasoning
- Authors: Qing Lyu, Shreya Havaldar, Adam Stein, Li Zhang, Delip Rao, Eric Wong,
Marianna Apidianaki, Chris Callison-Burch
- Abstract summary: Chain-of-Thought (CoT) prompting boosts Language Models' (LM) performance on a gamut of reasoning tasks.
We propose Faithful CoT, a reasoning framework involving two stages: Translation and Problem Solving.
This guarantees that the reasoning chain provides a faithful explanation of the final answer.
- Score: 51.21714389639417
- License: http://creativecommons.org/licenses/by-nc-nd/4.0/
- Abstract: While Chain-of-Thought (CoT) prompting boosts Language Models' (LM)
performance on a gamut of complex reasoning tasks, the generated reasoning
chain does not necessarily reflect how the model arrives at the answer (aka.
faithfulness). We propose Faithful CoT, a reasoning framework involving two
stages: Translation (Natural Language query $\rightarrow$ symbolic reasoning
chain) and Problem Solving (reasoning chain $\rightarrow$ answer), using an LM
and a deterministic solver respectively. This guarantees that the reasoning
chain provides a faithful explanation of the final answer. Aside from
interpretability, Faithful CoT also improves empirical performance: it
outperforms standard CoT on 9 of 10 benchmarks from 4 diverse domains, with a
relative accuracy gain of 6.3% on Math Word Problems (MWP), 3.4% on Planning,
5.5% on Multi-hop Question Answering (QA), and 21.4% on Relational Inference.
Furthermore, with GPT-4 and Codex, it sets the new state-of-the-art few-shot
performance on 7 datasets (with 95.0+ accuracy on 6 of them), showing a strong
synergy between faithfulness and accuracy.
Related papers
- To CoT or not to CoT? Chain-of-thought helps mainly on math and symbolic reasoning [55.52872152909785]
Chain-of-thought (CoT) via prompting is the de facto method for eliciting reasoning capabilities from large language models (LLMs)
We show that CoT gives strong performance benefits primarily on tasks involving math or logic, with much smaller gains on other types of tasks.
arXiv Detail & Related papers (2024-09-18T17:55:00Z) - Deciphering the Factors Influencing the Efficacy of Chain-of-Thought: Probability, Memorization, and Noisy Reasoning [11.758019716526459]
Chain-of-Thought (CoT) prompting has been shown to enhance the multi-step reasoning capabilities of Large Language Models (LLMs)
We show that CoT prompting performance reflects both memorization and a probabilistic version of genuine reasoning.
arXiv Detail & Related papers (2024-07-01T18:01:07Z) - A Hopfieldian View-based Interpretation for Chain-of-Thought Reasoning [48.51969964676017]
Chain-of-Thought (CoT) holds a significant place in augmenting the reasoning performance for large language models.
We propose a Read-and-Control approach for controlling the accuracy of CoT.
arXiv Detail & Related papers (2024-06-18T04:07:13Z) - Mitigating Misleading Chain-of-Thought Reasoning with Selective Filtering [59.495717939664246]
Large language models have manifested remarkable capabilities by leveraging chain-of-thought (CoT) reasoning techniques to solve intricate questions.
We propose a novel approach called the selective filtering reasoner (SelF-Reasoner) that assesses the entailment relationship between the question and the candidate reasoning chain.
SelF-Reasoner improves the fine-tuned T5 baseline consistently over the ScienceQA, ECQA, and LastLetter tasks.
arXiv Detail & Related papers (2024-03-28T06:28:35Z) - Can We Verify Step by Step for Incorrect Answer Detection? [22.984011562264147]
We introduce a benchmark, R2PE, designed to explore the relationship between reasoning chains and performance in various reasoning tasks.
This benchmark aims to measure the falsehood of the final output of LLMs based on the reasoning steps.
We propose the process discernibility score (PDS) framework that beats the answer-checking baseline by a large margin.
arXiv Detail & Related papers (2024-02-16T09:29:50Z) - LINC: A Neurosymbolic Approach for Logical Reasoning by Combining
Language Models with First-Order Logic Provers [60.009969929857704]
Logical reasoning is an important task for artificial intelligence with potential impacts on science, mathematics, and society.
In this work, we reformulating such tasks as modular neurosymbolic programming, which we call LINC.
We observe significant performance gains on FOLIO and a balanced subset of ProofWriter for three different models in nearly all experimental conditions we evaluate.
arXiv Detail & Related papers (2023-10-23T17:58:40Z) - Self-Consistency Improves Chain of Thought Reasoning in Language Models [53.45015291520658]
We explore a simple ensemble strategy, self-consistency, that significantly improves the reasoning accuracy of large language models.
For arithmetic and commonsense reasoning benchmarks we find that self-consistency yields significant accuracy improvements.
arXiv Detail & Related papers (2022-03-21T17:48:52Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.