Thoughts without Thinking: Reconsidering the Explanatory Value of Chain-of-Thought Reasoning in LLMs through Agentic Pipelines
- URL: http://arxiv.org/abs/2505.00875v1
- Date: Thu, 01 May 2025 21:37:30 GMT
- Title: Thoughts without Thinking: Reconsidering the Explanatory Value of Chain-of-Thought Reasoning in LLMs through Agentic Pipelines
- Authors: Ramesh Manuvinakurike, Emanuel Moss, Elizabeth Anne Watkins, Saurav Sahay, Giuseppe Raffa, Lama Nachman,
- Abstract summary: Agentic pipelines present novel challenges and opportunities for human-centered explainability.<n>We present early findings from an agentic pipeline implementation of a perceptive task guidance system.
- Score: 8.72060285437636
- License: http://creativecommons.org/licenses/by-nc-sa/4.0/
- Abstract: Agentic pipelines present novel challenges and opportunities for human-centered explainability. The HCXAI community is still grappling with how best to make the inner workings of LLMs transparent in actionable ways. Agentic pipelines consist of multiple LLMs working in cooperation with minimal human control. In this research paper, we present early findings from an agentic pipeline implementation of a perceptive task guidance system. Through quantitative and qualitative analysis, we analyze how Chain-of-Thought (CoT) reasoning, a common vehicle for explainability in LLMs, operates within agentic pipelines. We demonstrate that CoT reasoning alone does not lead to better outputs, nor does it offer explainability, as it tends to produce explanations without explainability, in that they do not improve the ability of end users to better understand systems or achieve their goals.
Related papers
- Please Translate Again: Two Simple Experiments on Whether Human-Like Reasoning Helps Translation [7.376832526909754]
Large Language Models (LLMs) demonstrate strong reasoning capabilities for many tasks, often by explicitly decomposing the task via Chain-of-Thought (CoT) reasoning.<n>textitTranslating Step-by-stepcitepbriakou2024translating, for instance, introduces a multi-step prompt with decomposition and refinement of translation with LLMs.<n>We show that simply prompting LLMs to translate again'' yields even better results than human-like step-by-step prompting.
arXiv Detail & Related papers (2025-06-05T00:04:39Z) - Reasoning LLMs are Wandering Solution Explorers [5.3795217858078805]
This paper formalizes what constitutes systematic problem solving and identifies common failure modes that reveal reasoning LLMs to be wanderers rather than systematic explorers.<n>Our findings suggest that current models' performance can appear to be competent on simple tasks yet degrade sharply as complexity increases.
arXiv Detail & Related papers (2025-05-26T17:59:53Z) - ToTRL: Unlock LLM Tree-of-Thoughts Reasoning Potential through Puzzles Solving [4.987786842464663]
Tree-of-thoughts (ToT) offers a conceptually more advanced approach by modeling reasoning as an exploration within a tree structure.<n>ToTRL is designed to guide LLMs in developing the parallel ToT strategy based on the sequential CoT strategy.<n>Our ToTQwen3-8B model, trained with ToTRL, achieves significant improvement in performance and reasoning efficiency on complex reasoning tasks.
arXiv Detail & Related papers (2025-05-19T05:18:58Z) - Satori: Reinforcement Learning with Chain-of-Action-Thought Enhances LLM Reasoning via Autoregressive Search [57.28671084993782]
Large language models (LLMs) have demonstrated remarkable reasoning capabilities across diverse domains.<n>Recent studies have shown that increasing test-time computation enhances LLMs' reasoning capabilities.<n>We propose a two-stage training paradigm: 1) a small-scale format tuning stage to internalize the COAT reasoning format and 2) a large-scale self-improvement stage leveraging reinforcement learning.
arXiv Detail & Related papers (2025-02-04T17:26:58Z) - RAG-Star: Enhancing Deliberative Reasoning with Retrieval Augmented Verification and Refinement [85.08223786819532]
Existing large language models (LLMs) show exceptional problem-solving capabilities but might struggle with complex reasoning tasks.<n>We propose textbfRAG-Star, a novel RAG approach that integrates retrieved information to guide the tree-based deliberative reasoning process.<n>Our experiments involving Llama-3.1-8B-Instruct and GPT-4o demonstrate that RAG-Star significantly outperforms previous RAG and reasoning methods.
arXiv Detail & Related papers (2024-12-17T13:05:36Z) - Insight-V: Exploring Long-Chain Visual Reasoning with Multimodal Large Language Models [64.1799100754406]
Large Language Models (LLMs) demonstrate enhanced capabilities and reliability by reasoning more.<n>Despite various efforts to improve LLM reasoning, high-quality long-chain reasoning data and optimized training pipelines still remain inadequately explored in vision-language tasks.<n>We present Insight-V, an early effort to 1) scalably produce long and robust reasoning data for complex multi-modal tasks, and 2) an effective training pipeline to enhance the reasoning capabilities of MLLMs.
arXiv Detail & Related papers (2024-11-21T18:59:55Z) - Make LLMs better zero-shot reasoners: Structure-orientated autonomous reasoning [52.83539473110143]
We introduce a novel structure-oriented analysis method to help Large Language Models (LLMs) better understand a question.
To further improve the reliability in complex question-answering tasks, we propose a multi-agent reasoning system, Structure-oriented Autonomous Reasoning Agents (SARA)
Extensive experiments verify the effectiveness of the proposed reasoning system. Surprisingly, in some cases, the system even surpasses few-shot methods.
arXiv Detail & Related papers (2024-10-18T05:30:33Z) - ChainBuddy: An AI Agent System for Generating LLM Pipelines [2.7624021966289605]
ChainBuddy is an AI workflow generation assistant built into the ChainForge platform.<n>From a single prompt or chat, ChainBuddy generates a starter evaluative pipeline in ChainForge aligned to the user's requirements.<n>We find that when using AI assistance, participants reported a less demanding workload, felt more confident, and produced higher quality pipelines evaluating LLM behavior.
arXiv Detail & Related papers (2024-09-20T15:42:33Z) - Faithfulness vs. Plausibility: On the (Un)Reliability of Explanations from Large Language Models [26.11408084129897]
Large Language Models (LLMs) are deployed as powerful tools for several natural language processing (NLP) applications.
Recent works show that modern LLMs can generate self-explanations (SEs), which elicit their intermediate reasoning steps for explaining their behavior.
We discuss the dichotomy between faithfulness and plausibility in SEs generated by LLMs.
arXiv Detail & Related papers (2024-02-07T06:32:50Z) - Learning to Generate Explainable Stock Predictions using Self-Reflective
Large Language Models [54.21695754082441]
We propose a framework to teach Large Language Models (LLMs) to generate explainable stock predictions.
A reflective agent learns how to explain past stock movements through self-reasoning, while the PPO trainer trains the model to generate the most likely explanations.
Our framework can outperform both traditional deep-learning and LLM methods in prediction accuracy and Matthews correlation coefficient.
arXiv Detail & Related papers (2024-02-06T03:18:58Z) - Towards LogiGLUE: A Brief Survey and A Benchmark for Analyzing Logical Reasoning Capabilities of Language Models [56.34029644009297]
Large language models (LLMs) have demonstrated the ability to overcome various limitations of formal Knowledge Representation (KR) systems.
LLMs excel most in abductive reasoning, followed by deductive reasoning, while they are least effective at inductive reasoning.
We study single-task training, multi-task training, and "chain-of-thought" knowledge distillation fine-tuning technique to assess the performance of model.
arXiv Detail & Related papers (2023-10-02T01:00:50Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.