Tree of Uncertain Thoughts Reasoning for Large Language Models
- URL: http://arxiv.org/abs/2309.07694v1
- Date: Thu, 14 Sep 2023 13:14:51 GMT
- Title: Tree of Uncertain Thoughts Reasoning for Large Language Models
- Authors: Shentong Mo, Miao Xin
- Abstract summary: We introduce the Tree of Uncertain Thoughts (TouT) - a reasoning framework tailored for Large Language Models (LLMs)
Our TouT effectively leverages Monte Carlo Dropout to quantify uncertainty scores associated with LLMs' diverse local responses at these intermediate steps.
We substantiate our approach with rigorous experiments on two demanding planning tasks: Game of 24 and Mini Crosswords.
- Score: 19.926757833392212
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: While the recently introduced Tree of Thoughts (ToT) has heralded
advancements in allowing Large Language Models (LLMs) to reason through
foresight and backtracking for global decision-making, it has overlooked the
inherent local uncertainties in intermediate decision points or "thoughts".
These local uncertainties, intrinsic to LLMs given their potential for diverse
responses, remain a significant concern in the reasoning process. Addressing
this pivotal gap, we introduce the Tree of Uncertain Thoughts (TouT) - a
reasoning framework tailored for LLMs. Our TouT effectively leverages Monte
Carlo Dropout to quantify uncertainty scores associated with LLMs' diverse
local responses at these intermediate steps. By marrying this local uncertainty
quantification with global search algorithms, TouT enhances the model's
precision in response generation. We substantiate our approach with rigorous
experiments on two demanding planning tasks: Game of 24 and Mini Crosswords.
The empirical evidence underscores TouT's superiority over both ToT and
chain-of-thought prompting methods.
Related papers
- Guiding Reasoning in Small Language Models with LLM Assistance [23.3038074903744]
Small Language Models cast doubt suitability for tasks demanding deep, multi-step logical deduction.
This paper introduces a framework called Small Reasons, Large Hints, which selectively augments SLM reasoning with targeted guidance from large language models.
Our experiments on mathematical reasoning datasets demonstrate that targeted external scaffolding significantly improves performance.
arXiv Detail & Related papers (2025-04-14T06:32:45Z) - Missing Premise exacerbates Overthinking: Are Reasoning Models losing Critical Thinking Skill? [27.374491920521745]
We find that the response length of reasoning LLMs drastically increases for ill-posed questions with missing premises (MiP)
This newly introduced scenario exacerbates the general overthinking issue to a large extent, which we name as the MiP-Overthinking.
Surprisingly, LLMs not specifically trained for reasoning exhibit much better performance on the MiP scenario, producing much shorter responses that quickly identify ill-posed queries.
arXiv Detail & Related papers (2025-04-09T01:25:27Z) - Critical-Questions-of-Thought: Steering LLM reasoning with Argumentative Querying [0.3659498819753633]
State-of-the-art Large Language models (LLMs) continue to struggle when performing logical and mathematical reasoning.
This paper makes use of the notion of critical questions from the literature on argumentation theory, focusing in particular on Toulmin's model of argumentation.
We show that employing these critical questions can improve the reasoning capabilities of LLMs.
arXiv Detail & Related papers (2024-12-19T18:51:30Z) - Understanding the Relationship between Prompts and Response Uncertainty in Large Language Models [55.332004960574004]
Large language models (LLMs) are widely used in decision-making, but their reliability, especially in critical tasks like healthcare, is not well-established.
This paper investigates how the uncertainty of responses generated by LLMs relates to the information provided in the input prompt.
We propose a prompt-response concept model that explains how LLMs generate responses and helps understand the relationship between prompts and response uncertainty.
arXiv Detail & Related papers (2024-07-20T11:19:58Z) - Do Large Language Models Exhibit Cognitive Dissonance? Studying the Difference Between Revealed Beliefs and Stated Answers [13.644277507363036]
We investigate whether these abilities are measurable outside of tailored prompting and MCQ.
Our findings suggest that the Revealed Belief of LLMs significantly differs from their Stated Answer.
As text completion is at the core of LLMs, these results suggest that common evaluation methods may only provide a partial picture.
arXiv Detail & Related papers (2024-06-21T08:56:35Z) - MR-Ben: A Meta-Reasoning Benchmark for Evaluating System-2 Thinking in LLMs [55.20845457594977]
Large language models (LLMs) have shown increasing capability in problem-solving and decision-making.
We present a process-based benchmark MR-Ben that demands a meta-reasoning skill.
Our meta-reasoning paradigm is especially suited for system-2 slow thinking.
arXiv Detail & Related papers (2024-06-20T03:50:23Z) - Aggregation of Reasoning: A Hierarchical Framework for Enhancing Answer Selection in Large Language Models [84.15513004135576]
Current research enhances the reasoning performance of Large Language Models (LLMs) by sampling multiple reasoning chains and ensembling based on the answer frequency.
This approach fails in scenarios where the correct answers are in the minority.
We introduce a hierarchical reasoning aggregation framework AoR, which selects answers based on the evaluation of reasoning chains.
arXiv Detail & Related papers (2024-05-21T17:12:19Z) - Beyond Accuracy: Evaluating the Reasoning Behavior of Large Language Models -- A Survey [25.732397636695882]
Large language models (LLMs) have recently shown impressive performance on tasks involving reasoning.
Despite these successes, the depth of LLMs' reasoning abilities remains uncertain.
arXiv Detail & Related papers (2024-04-02T11:46:31Z) - Direct Evaluation of Chain-of-Thought in Multi-hop Reasoning with Knowledge Graphs [52.42505579545893]
Large language models (LLMs) demonstrate strong reasoning abilities when prompted to generate chain-of-thought explanations alongside answers.
We propose a novel discriminative and generative CoT evaluation paradigm to assess LLMs' knowledge of reasoning and the accuracy of the generated CoT.
arXiv Detail & Related papers (2024-02-17T05:22:56Z) - Uncertainty Quantification for In-Context Learning of Large Language Models [52.891205009620364]
In-context learning has emerged as a groundbreaking ability of Large Language Models (LLMs)
We propose a novel formulation and corresponding estimation method to quantify both types of uncertainties.
The proposed method offers an unsupervised way to understand the prediction of in-context learning in a plug-and-play fashion.
arXiv Detail & Related papers (2024-02-15T18:46:24Z) - Quantifying Uncertainty in Natural Language Explanations of Large
Language Models [29.34960984639281]
Large Language Models (LLMs) are increasingly used as powerful tools for high-stakes natural language processing (NLP) applications.
We propose two novel metrics -- $textitVerbalized Uncertainty$ and $textitProbing Uncertainty$ -- to quantify the uncertainty of generated explanations.
Our empirical analysis of benchmark datasets reveals that verbalized uncertainty is not a reliable estimate of explanation confidence.
arXiv Detail & Related papers (2023-11-06T21:14:40Z) - Reason for Future, Act for Now: A Principled Framework for Autonomous
LLM Agents with Provable Sample Efficiency [53.8779374188643]
We propose a principled framework with provable regret guarantees to orchestrate reasoning and acting.
Specifically, we design a prompt template for reasoning that learns from the memory buffer and plans a future trajectory over a long horizon.
At each step, the LLM agent takes the initial action of the planned trajectory ("act for now"), stores the collected feedback in the memory buffer, and reinvokes the reasoning routine to replan the future trajectory from the new state.
arXiv Detail & Related papers (2023-09-29T16:36:39Z) - Are Large Language Models Really Robust to Word-Level Perturbations? [68.60618778027694]
We propose a novel rational evaluation approach that leverages pre-trained reward models as diagnostic tools.
Longer conversations manifest the comprehensive grasp of language models in terms of their proficiency in understanding questions.
Our results demonstrate that LLMs frequently exhibit vulnerability to word-level perturbations that are commonplace in daily language usage.
arXiv Detail & Related papers (2023-09-20T09:23:46Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.