Think-to-Talk or Talk-to-Think? When LLMs Come Up with an Answer in Multi-Step Reasoning
- URL: http://arxiv.org/abs/2412.01113v1
- Date: Mon, 02 Dec 2024 04:35:54 GMT
- Title: Think-to-Talk or Talk-to-Think? When LLMs Come Up with an Answer in Multi-Step Reasoning
- Authors: Keito Kudo, Yoichi Aoki, Tatsuki Kuribayashi, Shusaku Sone, Masaya Taniguchi, Ana Brassard, Keisuke Sakaguchi, Kentaro Inui,
- Abstract summary: We investigate the internal reasoning mechanism of language models during symbolic multi-step reasoning.<n>We find that simple subproblems are solved before chain-of-thought begins, and more complicated multi-hop calculations are performed during CoT.
- Score: 26.79907640964047
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: This study investigates the internal reasoning mechanism of language models during symbolic multi-step reasoning, motivated by the question of whether chain-of-thought (CoT) outputs are faithful to the model's internals. Specifically, we inspect when they internally determine their answers, particularly before or after CoT begins, to determine whether models follow a post-hoc "think-to-talk" mode or a step-by-step "talk-to-think" mode of explanation. Through causal probing experiments in controlled arithmetic reasoning tasks, we found systematic internal reasoning patterns across models; for example, simple subproblems are solved before CoT begins, and more complicated multi-hop calculations are performed during CoT.
Related papers
- Computational Thinking Reasoning in Large Language Models [69.28428524878885]
Computational Thinking Model (CTM) is a novel framework that incorporates computational thinking paradigms into large language models (LLMs)<n>Live code execution is seamlessly integrated into the reasoning process, allowing CTM to think by computing.<n>CTM outperforms conventional reasoning models and tool-augmented baselines in terms of accuracy, interpretability, and generalizability.
arXiv Detail & Related papers (2025-06-03T09:11:15Z) - Hypothesis-Driven Theory-of-Mind Reasoning for Large Language Models [76.6028674686018]
We introduce thought-tracing, an inference-time reasoning algorithm to trace the mental states of agents.
Our algorithm is modeled after the Bayesian theory-of-mind framework.
We evaluate thought-tracing on diverse theory-of-mind benchmarks, demonstrating significant performance improvements.
arXiv Detail & Related papers (2025-02-17T15:08:50Z) - Critical-Questions-of-Thought: Steering LLM reasoning with Argumentative Querying [0.3659498819753633]
State-of-the-art Large Language models (LLMs) continue to struggle when performing logical and mathematical reasoning.
This paper makes use of the notion of critical questions from the literature on argumentation theory, focusing in particular on Toulmin's model of argumentation.
We show that employing these critical questions can improve the reasoning capabilities of LLMs.
arXiv Detail & Related papers (2024-12-19T18:51:30Z) - FLARE: Faithful Logic-Aided Reasoning and Exploration [50.9814063216852]
We introduce a novel approach for traversing the problem space using task decompositions.
We use the Large Language Models to plan a solution, soft-formalise the query into facts and predicates using a logic programming code.
Our method allows us to compute the faithfulness of the reasoning process w.r.t. the generated code and analyse the steps of the multi-hop search without relying on external solvers.
arXiv Detail & Related papers (2024-10-14T19:39:11Z) - STOC-TOT: Stochastic Tree-of-Thought with Constrained Decoding for Complex Reasoning in Multi-Hop Question Answering [8.525847131940031]
Multi-hop question answering (MHQA) requires a model to retrieve and integrate information from multiple passages to answer a complex question.
Recent systems leverage the power of large language models and integrate evidence retrieval with reasoning prompts.
We propose STOC-TOT, a tree-of-thought reasoning prompting method with constrained decoding for MHQA.
arXiv Detail & Related papers (2024-07-04T07:17:53Z) - Distributional reasoning in LLMs: Parallel reasoning processes in multi-hop reasoning [8.609587510471943]
We introduce a novel and interpretable analysis of internal multi-hop reasoning processes in large language models.
We show that during inference, the middle layers of the network generate highly interpretable embeddings.
Our findings can help uncover the strategies that LLMs use to solve reasoning tasks, offering insights into the types of thought processes that can emerge from artificial intelligence.
arXiv Detail & Related papers (2024-06-19T21:36:40Z) - How to think step-by-step: A mechanistic understanding of chain-of-thought reasoning [44.02173413922695]
A lack of understanding prevails around the internal mechanisms of the models that facilitate Chain-of-Thought (CoT) prompting.
This work investigates the sub-structures within Large Language Models that manifest CoT reasoning from a point of view.
arXiv Detail & Related papers (2024-02-28T13:14:20Z) - Boosting of Thoughts: Trial-and-Error Problem Solving with Large Language Models [43.09706839884221]
Boosting of Thoughts (BoT) is an automated prompting framework for problem solving with Large Language Models.
We show that BoT consistently achieves higher or comparable problem-solving rates than other advanced prompting approaches.
arXiv Detail & Related papers (2024-02-17T00:13:36Z) - Towards a Mechanistic Interpretation of Multi-Step Reasoning
Capabilities of Language Models [107.07851578154242]
Language models (LMs) have strong multi-step (i.e., procedural) reasoning capabilities.
It is unclear whether LMs perform tasks by cheating with answers memorized from pretraining corpus, or, via a multi-step reasoning mechanism.
We show that MechanisticProbe is able to detect the information of the reasoning tree from the model's attentions for most examples.
arXiv Detail & Related papers (2023-10-23T01:47:29Z) - Measuring Faithfulness in Chain-of-Thought Reasoning [19.074147845029355]
Large language models (LLMs) perform better when they produce step-by-step, "Chain-of-Thought" (CoT) reasoning before answering a question.
It is unclear if the stated reasoning is a faithful explanation of the model's actual reasoning (i.e., its process for answering the question)
We investigate hypotheses for how CoT reasoning may be unfaithful, by examining how the model predictions change when we intervene on the CoT.
arXiv Detail & Related papers (2023-07-17T01:08:39Z) - Unlocking Temporal Question Answering for Large Language Models with Tailor-Made Reasoning Logic [84.59255070520673]
Large language models (LLMs) face a challenge when engaging in temporal reasoning.
We propose TempLogic, a novel framework designed specifically for temporal question-answering tasks.
arXiv Detail & Related papers (2023-05-24T10:57:53Z) - The Art of SOCRATIC QUESTIONING: Recursive Thinking with Large Language
Models [45.01562498702836]
Chain-of-Thought (CoT) prompting enables large language models to solve complex reasoning problems by generating intermediate steps.
We propose SOCRATIC QUESTIONING, a divide-and-conquer style algorithm that mimics the recursive thinking process.
arXiv Detail & Related papers (2023-05-24T10:36:14Z) - HOP, UNION, GENERATE: Explainable Multi-hop Reasoning without Rationale
Supervision [118.0818807474809]
This work proposes a principled, probabilistic approach for training explainable multi-hop QA systems without rationale supervision.
Our approach performs multi-hop reasoning by explicitly modeling rationales as sets, enabling the model to capture interactions between documents and sentences within a document.
arXiv Detail & Related papers (2023-05-23T16:53:49Z) - Faithful Question Answering with Monte-Carlo Planning [78.02429369951363]
We propose FAME (FAithful question answering with MontE-carlo planning) to answer questions based on faithful reasoning steps.
We formulate the task as a discrete decision-making problem and solve it through the interaction of a reasoning environment and a controller.
FAME achieves state-of-the-art performance on the standard benchmark.
arXiv Detail & Related papers (2023-05-04T05:21:36Z) - STREET: A Multi-Task Structured Reasoning and Explanation Benchmark [56.555662318619135]
We introduce a unified multi-task and multi-domain natural language reasoning and explanation benchmark.
We expect models to not only answer questions, but also produce step-by-step structured explanations describing how premises in the question are used to produce intermediate conclusions that can prove the correctness of a certain answer.
arXiv Detail & Related papers (2023-02-13T22:34:02Z) - Discrete Reasoning Templates for Natural Language Understanding [79.07883990966077]
We present an approach that reasons about complex questions by decomposing them to simpler subquestions.
We derive the final answer according to instructions in a predefined reasoning template.
We show that our approach is competitive with the state-of-the-art while being interpretable and requires little supervision.
arXiv Detail & Related papers (2021-04-05T18:56:56Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.