Chain-of-Probe: Examing the Necessity and Accuracy of CoT Step-by-Step
- URL: http://arxiv.org/abs/2406.16144v1
- Date: Sun, 23 Jun 2024 15:50:22 GMT
- Title: Chain-of-Probe: Examing the Necessity and Accuracy of CoT Step-by-Step
- Authors: Zezhong Wang, Xingshan Zeng, Weiwen Liu, Yufei Wang, Liangyou Li, Yasheng Wang, Lifeng Shang, Xin Jiang, Qun Liu, Kam-Fai Wong,
- Abstract summary: We propose a method to probe changes in the mind during the model's reasoning.
By analyzing patterns in mind change, we examine the correctness of the model's reasoning.
Our validation reveals that many responses, although correct in their final answer, contain errors in their reasoning process.
- Score: 81.50681925980135
- License: http://creativecommons.org/licenses/by-nc-nd/4.0/
- Abstract: Current research found the issue of Early Answering in large language models (LLMs), where the models already have an answer before generating the Chain-of-Thought (CoT). This phenomenon suggests a potential lack of necessary dependency between the predicted answer and the reasoning process. Consequently, two important questions arise: (1) Is CoT still necessary if the model already has an answer? (2) Can the correctness of the answer serve as valid evidence for the correctness of CoT? To address these questions, we propose a method, namely Chain-of-Probe (CoP), to probe changes in the mind during the model's reasoning. The probing results show that in a significant number of question-answer cases, CoT appears to be unnecessary, and this necessity correlates with the simplicity of the task, defined by reasoning steps required. Furthermore, by analyzing patterns in mind change, we examine the correctness of the model's reasoning. Our validation reveals that many responses, although correct in their final answer, contain errors in their reasoning process. To this end, we propose a strategic approach based on CoP to prioritize answers with correct reasoning among multiple candidates, thereby bolstering the reliability of the model's reasoning.
Related papers
- A Hopfieldian View-based Interpretation for Chain-of-Thought Reasoning [48.51969964676017]
Chain-of-Thought (CoT) holds a significant place in augmenting the reasoning performance for large language models.
We propose a Read-and-Control approach for controlling the accuracy of CoT.
arXiv Detail & Related papers (2024-06-18T04:07:13Z) - Mitigating Misleading Chain-of-Thought Reasoning with Selective Filtering [59.495717939664246]
Large language models have manifested remarkable capabilities by leveraging chain-of-thought (CoT) reasoning techniques to solve intricate questions.
We propose a novel approach called the selective filtering reasoner (SelF-Reasoner) that assesses the entailment relationship between the question and the candidate reasoning chain.
SelF-Reasoner improves the fine-tuned T5 baseline consistently over the ScienceQA, ECQA, and LastLetter tasks.
arXiv Detail & Related papers (2024-03-28T06:28:35Z) - Measuring Faithfulness in Chain-of-Thought Reasoning [19.074147845029355]
Large language models (LLMs) perform better when they produce step-by-step, "Chain-of-Thought" (CoT) reasoning before answering a question.
It is unclear if the stated reasoning is a faithful explanation of the model's actual reasoning (i.e., its process for answering the question)
We investigate hypotheses for how CoT reasoning may be unfaithful, by examining how the model predictions change when we intervene on the CoT.
arXiv Detail & Related papers (2023-07-17T01:08:39Z) - Question Decomposition Improves the Faithfulness of Model-Generated
Reasoning [23.34325378824462]
Large language models (LLMs) are difficult to verify the correctness and safety of their behavior.
One approach is to prompt LLMs to externalize their reasoning, by having them generate step-by-step reasoning as they answer a question.
This approach relies on the stated reasoning faithfully reflecting the model's actual reasoning, which is not always the case.
Decomposition-based methods achieve strong performance on question-answering tasks, sometimes approaching that of CoT.
arXiv Detail & Related papers (2023-07-17T00:54:10Z) - ReCEval: Evaluating Reasoning Chains via Correctness and Informativeness [67.49087159888298]
ReCEval is a framework that evaluates reasoning chains via two key properties: correctness and informativeness.
We show that ReCEval effectively identifies various error types and yields notable improvements compared to prior methods.
arXiv Detail & Related papers (2023-04-21T02:19:06Z) - Measuring and Narrowing the Compositionality Gap in Language Models [116.5228850227024]
We measure how often models can correctly answer all sub-problems but not generate the overall solution.
We present a new method, self-ask, that further improves on chain of thought.
arXiv Detail & Related papers (2022-10-07T06:50:23Z) - Robustifying Multi-hop QA through Pseudo-Evidentiality Training [28.584236042324896]
We study the bias problem of multi-hop question answering models, of answering correctly without correct reasoning.
We propose a new approach to learn evidentiality, deciding whether the answer prediction is supported by correct evidences.
arXiv Detail & Related papers (2021-07-07T14:15:14Z) - Generative Context Pair Selection for Multi-hop Question Answering [60.74354009152721]
We propose a generative context selection model for multi-hop question answering.
Our proposed generative passage selection model has a better performance (4.9% higher than baseline) on adversarial held-out set.
arXiv Detail & Related papers (2021-04-18T07:00:48Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.