Measuring Faithfulness in Chain-of-Thought Reasoning
- URL: http://arxiv.org/abs/2307.13702v1
- Date: Mon, 17 Jul 2023 01:08:39 GMT
- Title: Measuring Faithfulness in Chain-of-Thought Reasoning
- Authors: Tamera Lanham, Anna Chen, Ansh Radhakrishnan, Benoit Steiner, Carson
Denison, Danny Hernandez, Dustin Li, Esin Durmus, Evan Hubinger, Jackson
Kernion, Kamil\.e Luko\v{s}i\=ut\.e, Karina Nguyen, Newton Cheng, Nicholas
Joseph, Nicholas Schiefer, Oliver Rausch, Robin Larson, Sam McCandlish,
Sandipan Kundu, Saurav Kadavath, Shannon Yang, Thomas Henighan, Timothy
Maxwell, Timothy Telleen-Lawton, Tristan Hume, Zac Hatfield-Dodds, Jared
Kaplan, Jan Brauner, Samuel R. Bowman, Ethan Perez
- Abstract summary: Large language models (LLMs) perform better when they produce step-by-step, "Chain-of-Thought" (CoT) reasoning before answering a question.
It is unclear if the stated reasoning is a faithful explanation of the model's actual reasoning (i.e., its process for answering the question)
We investigate hypotheses for how CoT reasoning may be unfaithful, by examining how the model predictions change when we intervene on the CoT.
- Score: 19.074147845029355
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Large language models (LLMs) perform better when they produce step-by-step,
"Chain-of-Thought" (CoT) reasoning before answering a question, but it is
unclear if the stated reasoning is a faithful explanation of the model's actual
reasoning (i.e., its process for answering the question). We investigate
hypotheses for how CoT reasoning may be unfaithful, by examining how the model
predictions change when we intervene on the CoT (e.g., by adding mistakes or
paraphrasing it). Models show large variation across tasks in how strongly they
condition on the CoT when predicting their answer, sometimes relying heavily on
the CoT and other times primarily ignoring it. CoT's performance boost does not
seem to come from CoT's added test-time compute alone or from information
encoded via the particular phrasing of the CoT. As models become larger and
more capable, they produce less faithful reasoning on most tasks we study.
Overall, our results suggest that CoT can be faithful if the circumstances such
as the model size and task are carefully chosen.
Related papers
- Chain-of-Probe: Examing the Necessity and Accuracy of CoT Step-by-Step [81.50681925980135]
We propose a method to probe changes in the mind during the model's reasoning.
By analyzing patterns in mind change, we examine the correctness of the model's reasoning.
Our validation reveals that many responses, although correct in their final answer, contain errors in their reasoning process.
arXiv Detail & Related papers (2024-06-23T15:50:22Z) - A Hopfieldian View-based Interpretation for Chain-of-Thought Reasoning [48.51969964676017]
Chain-of-Thought (CoT) holds a significant place in augmenting the reasoning performance for large language models.
We propose a Read-and-Control approach for controlling the accuracy of CoT.
arXiv Detail & Related papers (2024-06-18T04:07:13Z) - Towards Faithful Chain-of-Thought: Large Language Models are Bridging Reasoners [19.40385041079461]
Large language models (LLMs) suffer from serious unfaithful chain-of-thought (CoT) issues.
We first study the CoT faithfulness issue at the granularity of CoT steps, identify two reasoning paradigms.
We then conduct a joint analysis of the causal relevance among the context, CoT, and answer during reasoning.
arXiv Detail & Related papers (2024-05-29T09:17:46Z) - Mitigating Misleading Chain-of-Thought Reasoning with Selective Filtering [59.495717939664246]
Large language models have manifested remarkable capabilities by leveraging chain-of-thought (CoT) reasoning techniques to solve intricate questions.
We propose a novel approach called the selective filtering reasoner (SelF-Reasoner) that assesses the entailment relationship between the question and the candidate reasoning chain.
SelF-Reasoner improves the fine-tuned T5 baseline consistently over the ScienceQA, ECQA, and LastLetter tasks.
arXiv Detail & Related papers (2024-03-28T06:28:35Z) - ChainLM: Empowering Large Language Models with Improved Chain-of-Thought Prompting [124.69672273754144]
Chain-of-Thought (CoT) prompting can enhance the reasoning capabilities of large language models (LLMs)
Existing CoT approaches usually focus on simpler reasoning tasks and thus result in low-quality and inconsistent CoT prompts.
We introduce CoTGenius, a novel framework designed for the automatic generation of superior CoT prompts.
arXiv Detail & Related papers (2024-03-21T11:34:26Z) - Question Decomposition Improves the Faithfulness of Model-Generated
Reasoning [23.34325378824462]
Large language models (LLMs) are difficult to verify the correctness and safety of their behavior.
One approach is to prompt LLMs to externalize their reasoning, by having them generate step-by-step reasoning as they answer a question.
This approach relies on the stated reasoning faithfully reflecting the model's actual reasoning, which is not always the case.
Decomposition-based methods achieve strong performance on question-answering tasks, sometimes approaching that of CoT.
arXiv Detail & Related papers (2023-07-17T00:54:10Z) - Language Models Don't Always Say What They Think: Unfaithful
Explanations in Chain-of-Thought Prompting [43.458726163197824]
Large Language Models (LLMs) can achieve strong performance on many tasks by producing step-by-step reasoning before giving a final output.
We find that CoT explanations can systematically misrepresent the true reason for a model's prediction.
arXiv Detail & Related papers (2023-05-07T22:44:25Z) - SCOTT: Self-Consistent Chain-of-Thought Distillation [68.40232422158569]
Large language models (LMs) generate free-text rationales for their predictions via chain-of-thought prompting.
We propose a faithful knowledge distillation method to learn a small, self-consistent CoT model from a teacher model that is orders of magnitude larger.
To ensure faithful distillation, we use the teacher-generated rationales to learn a student LM with a counterfactual reasoning objective.
arXiv Detail & Related papers (2023-05-03T03:47:00Z) - Causal Expectation-Maximisation [70.45873402967297]
We show that causal inference is NP-hard even in models characterised by polytree-shaped graphs.
We introduce the causal EM algorithm to reconstruct the uncertainty about the latent variables from data about categorical manifest variables.
We argue that there appears to be an unnoticed limitation to the trending idea that counterfactual bounds can often be computed without knowledge of the structural equations.
arXiv Detail & Related papers (2020-11-04T10:25:13Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.