Rethinking Thinking Tokens: Understanding Why They Underperform in Practice
- URL: http://arxiv.org/abs/2411.11371v1
- Date: Mon, 18 Nov 2024 08:34:38 GMT
- Title: Rethinking Thinking Tokens: Understanding Why They Underperform in Practice
- Authors: Sreeram Vennam, David Valente, David Herel, Ponnurangam Kumaraguru,
- Abstract summary: Thinking Tokens (TT) have been proposed as an unsupervised method to facilitate reasoning in language models.
We show that TTs marginally improves performance and consistently underperforms compared to Chain-of-Thought (CoT) reasoning.
- Score: 6.102559098873098
- License:
- Abstract: Thinking Tokens (TT) have been proposed as an unsupervised method to facilitate reasoning in language models. However, despite their conceptual appeal, our findings show that TTs marginally improves performance and consistently underperforms compared to Chain-of-Thought (CoT) reasoning across multiple benchmarks. We hypothesize that this underperformance stems from the reliance on a single embedding for TTs, which results in inconsistent learning signals and introduces noisy gradients. This paper provides a comprehensive empirical analysis to validate this hypothesis and discusses the implications for future research on unsupervised reasoning in LLMs.
Related papers
- SoftCoT: Soft Chain-of-Thought for Efficient Reasoning with LLMs [48.28847964704554]
Chain-of-Thought (CoT) reasoning enables Large Language Models (LLMs) to solve complex reasoning tasks.
We propose a novel approach for continuous-space reasoning that does not require modifying the underlying LLM.
arXiv Detail & Related papers (2025-02-17T18:52:29Z) - Hypothesis-Driven Theory-of-Mind Reasoning for Large Language Models [76.6028674686018]
We introduce thought-tracing, an inference-time reasoning algorithm to trace the mental states of agents.
Our algorithm is modeled after the Bayesian theory-of-mind framework.
We evaluate thought-tracing on diverse theory-of-mind benchmarks, demonstrating significant performance improvements.
arXiv Detail & Related papers (2025-02-17T15:08:50Z) - Zero-Shot Verification-guided Chain of Thoughts [64.862738244735]
We focus on self-verification of self-generated reasoning steps via COT prompts in a completely zero-shot regime.
To explore this setting, we design a new zero-shot prompt, which we call COT STEP, to aid zero-shot decomposition of reasoning steps.
We evaluate the verifiers' ability to classify the correctness of reasoning chains and explore different ways to use verifier scores in guiding reasoning.
arXiv Detail & Related papers (2025-01-21T03:52:54Z) - Critical-Questions-of-Thought: Steering LLM reasoning with Argumentative Querying [0.3659498819753633]
State-of-the-art Large Language models (LLMs) continue to struggle when performing logical and mathematical reasoning.
This paper makes use of the notion of critical questions from the literature on argumentation theory, focusing in particular on Toulmin's model of argumentation.
We show that employing these critical questions can improve the reasoning capabilities of LLMs.
arXiv Detail & Related papers (2024-12-19T18:51:30Z) - Rethinking Chain-of-Thought from the Perspective of Self-Training [10.722453877596998]
Chain-of-thought (CoT) reasoning has emerged as an effective approach for activating latent capabilities in LLMs.
We propose a novel CoT framework to improve reasoning performance.
Our framework integrates two key components: (i) a task-specific prompt module that optimize the initial reasoning process, and (ii) an adaptive reasoning module that dynamically refines the reasoning process.
arXiv Detail & Related papers (2024-12-14T13:12:50Z) - What Makes In-context Learning Effective for Mathematical Reasoning: A Theoretical Analysis [81.15503859645149]
In this paper, we aim to theoretically analyze the impact of in-context demonstrations on large language models' reasoning performance.
We propose a straightforward, generalizable, and low-complexity demonstration selection method named LMS3.
arXiv Detail & Related papers (2024-12-11T11:38:11Z) - A Systematic Analysis of Large Language Models as Soft Reasoners: The Case of Syllogistic Inferences [5.141416267381492]
We consider the case of syllogistic reasoning, an area of deductive reasoning studied extensively in logic and cognitive psychology.
We investigate the effects of chain-of-thought reasoning, in-context learning, and supervised fine-tuning on syllogistic reasoning.
Our results suggest that the behavior of pre-trained LLMs can be explained by cognitive science.
arXiv Detail & Related papers (2024-06-17T08:59:04Z) - On the Hardness of Faithful Chain-of-Thought Reasoning in Large Language Models [25.029579061612456]
Large Language Models (LLMs) are increasingly being employed in real-world applications in critical domains such as healthcare.
It is important to ensure that the Chain-of-Thought (CoT) reasoning generated by these models faithfully captures their underlying behavior.
arXiv Detail & Related papers (2024-06-15T13:16:44Z) - Towards Understanding Chain-of-Thought Prompting: An Empirical Study of
What Matters [82.84696222087396]
Chain-of-Thought (CoT) prompting can dramatically improve the multi-step reasoning abilities of large language models (LLMs)
We show that CoT reasoning is possible even with invalid demonstrations.
arXiv Detail & Related papers (2022-12-20T05:20:54Z) - Logical Satisfiability of Counterfactuals for Faithful Explanations in
NLI [60.142926537264714]
We introduce the methodology of Faithfulness-through-Counterfactuals.
It generates a counterfactual hypothesis based on the logical predicates expressed in the explanation.
It then evaluates if the model's prediction on the counterfactual is consistent with that expressed logic.
arXiv Detail & Related papers (2022-05-25T03:40:59Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.