Related papers: From Token to Action: State Machine Reasoning to Mitigate Overthinking in Information Retrieval

From Token to Action: State Machine Reasoning to Mitigate Overthinking in Information Retrieval

URL: http://arxiv.org/abs/2505.23059v1
Date: Thu, 29 May 2025 04:04:25 GMT
Title: From Token to Action: State Machine Reasoning to Mitigate Overthinking in Information Retrieval
Authors: Dohyeon Lee, Yeonseok Jeong, Seung-won Hwang,
Abstract summary: Chain-of-Thought (CoT) prompting enables complex reasoning in large language models (LLMs)<n>We propose State Machine Reasoning (SMR), a transition-based reasoning framework composed of discrete actions.<n> Experiments on the BEIR and BRIGHT benchmarks show that SMR improves retrieval performance (nDCG@10) by 3.4% while reducing token usage by 74.4%.
Score: 22.35942074715463
License: http://creativecommons.org/licenses/by/4.0/
Abstract: Chain-of-Thought (CoT) prompting enables complex reasoning in large language models (LLMs), including applications in information retrieval (IR). However, it often leads to overthinking, where models produce excessively long and semantically redundant traces with little or no benefit. We identify two key challenges in IR: redundant trajectories that revisit similar states and misguided reasoning that diverges from user intent. To address these, we propose State Machine Reasoning (SMR), a transition-based reasoning framework composed of discrete actions (Refine, Rerank, Stop) that support early stopping and fine-grained control. Experiments on the BEIR and BRIGHT benchmarks show that SMR improves retrieval performance (nDCG@10) by 3.4% while reducing token usage by 74.4%. It generalizes across LLMs and retrievers without requiring task-specific tuning, offering a practical alternative to conventional CoT reasoning. The code and details are available at https://github.com/ldilab/SMR.

Related papers

ReCUT: Balancing Reasoning Length and Accuracy in LLMs via Stepwise Trails and Preference Optimization [16.51303604678232]
Reasoning Compression ThroUgh Stepwise Trials (ReCUT) is a novel method aimed at balancing the accuracy and length of reasoning trajectory.<n> Experimental results across multiple math reasoning datasets and backbone models demonstrate that ReCUT significantly reduces reasoning lengths by approximately 30-50%.
arXiv Detail & Related papers (2025-06-12T15:43:01Z)
AbstRaL: Augmenting LLMs' Reasoning by Reinforcing Abstract Thinking [38.8730008545358]
Large language models (LLMs) often lack robustness in their reasoning.<n>Our approach focuses on "abstracting" reasoning problems.<n>We find that this abstraction process is better acquired through reinforcement learning (RL) than just supervised fine-tuning.
arXiv Detail & Related papers (2025-06-09T13:34:50Z)
OThink-R1: Intrinsic Fast/Slow Thinking Mode Switching for Over-Reasoning Mitigation [33.008513399946914]
OThink-R1 is a method that prunes redundant reasoning steps while preserving logical validity.<n> Experiments across mathematical and question-answering tasks demonstrate that OThink-R1 reduces reasoning redundancy by almost 23% on average.
arXiv Detail & Related papers (2025-06-03T03:31:30Z)
Reinforcing Video Reasoning with Focused Thinking [65.85683941058916]
We propose TW-GRPO, a novel framework that enhances visual reasoning with focused thinking and dense reward granularity.<n>Specifically, we employ a token weighting mechanism that prioritizes tokens with high informational density.<n>We also reformulate RL training by shifting from single-choice to multi-choice QA tasks.
arXiv Detail & Related papers (2025-05-30T15:42:19Z)
Done Is Better than Perfect: Unlocking Efficient Reasoning by Structured Multi-Turn Decomposition [11.858707687894757]
Large Reasoning Models (LRMs) are criticized for the excessively lengthy Chain-of-Thought (CoT) to derive the final answer.<n>This paper introduces Multi-Turn Decomposition (MinD) to decode conventional CoT into a sequence of explicit, structured, and turn-wise interactions.<n>MinD can achieve up to 70% reduction in both output token usage and time to first token (TTFT)
arXiv Detail & Related papers (2025-05-26T10:18:57Z)
Reinforced Latent Reasoning for LLM-based Recommendation [83.18146814163308]
Large Language Models (LLMs) have demonstrated impressive reasoning capabilities in complex problem-solving tasks.<n>Existing methods typically rely on fine-tuning with explicit chain-of-thought (CoT) data.<n>In this work, we explore an alternative approach that shifts from explicit CoT reasoning to compact, information-dense latent reasoning.
arXiv Detail & Related papers (2025-05-25T11:03:45Z)
TrimR: Verifier-based Training-Free Thinking Compression for Efficient Test-Time Scaling [20.980976778470247]
Large Reasoning Models (LRMs) demonstrate exceptional capability in tackling complex mathematical, logical, and coding tasks.<n>We propose TrimR, a verifier-based, training-free, efficient framework for dynamic Chain-of-Thought (CoT) compression.
arXiv Detail & Related papers (2025-05-22T12:23:30Z)
ThinkRec: Thinking-based recommendation via LLM [19.398302729633397]
ThinkRec is a thinking-based framework that shifts LLM4Rec from System 1 to System 2 (rational system)<n> ThinkRec introduces a thinking activation mechanism that augments item metadata with keyword summarization and injects synthetic reasoning traces.<n>By dynamically assigning weights to expert models based on users' latent features, ThinkRec adapts its reasoning path to individual users, thereby enhancing precision and personalization.
arXiv Detail & Related papers (2025-05-21T04:25:18Z)
Let LLMs Break Free from Overthinking via Self-Braking Tuning [60.08396797526657]
Large reasoning models (LRMs) have significantly enhanced their reasoning capabilities by generating longer chains of thought.<n>This performance gain comes at the cost of a substantial increase in redundant reasoning during the generation process.<n>We propose a novel framework, Self-Braking Tuning (SBT), which tackles overthinking from the perspective of allowing the model to regulate its own reasoning process.
arXiv Detail & Related papers (2025-05-20T16:53:40Z)
Sketch-of-Thought: Efficient LLM Reasoning with Adaptive Cognitive-Inspired Sketching [60.04718679054704]
Chain-of-Thought prompting elicits step-by-step problem solving, but often at the cost of excessive verbosity in intermediate outputs.<n>We propose Sketch-of-Thought (SoT), a prompting framework that integrates cognitively inspired reasoning paradigms with linguistic constraints.<n>SoT achieves token reductions of up to 78% with minimal accuracy loss across 15 reasoning datasets.
arXiv Detail & Related papers (2025-03-07T06:57:17Z)
MR-Ben: A Meta-Reasoning Benchmark for Evaluating System-2 Thinking in LLMs [55.20845457594977]
Large language models (LLMs) have shown increasing capability in problem-solving and decision-making.<n>We present a process-based benchmark MR-Ben that demands a meta-reasoning skill.<n>Our meta-reasoning paradigm is especially suited for system-2 slow thinking.
arXiv Detail & Related papers (2024-06-20T03:50:23Z)
Large Language Models as an Indirect Reasoner: Contrapositive and Contradiction for Automated Reasoning [74.90592233107712]
We propose a Direct-Indirect Reasoning (DIR) method, which considers Direct Reasoning (DR) and Indirect Reasoning (IR) as multiple parallel reasoning paths that are merged to derive the final answer.<n>Our DIR method is simple yet effective and can be straightforwardly integrated with existing variants of CoT methods.
arXiv Detail & Related papers (2024-02-06T03:41:12Z)
Re-Reading Improves Reasoning in Large Language Models [87.46256176508376]
We introduce a simple, yet general and effective prompting method, Re2, to enhance the reasoning capabilities of off-the-shelf Large Language Models (LLMs) Unlike most thought-eliciting prompting methods, such as Chain-of-Thought (CoT), Re2 shifts the focus to the input by processing questions twice, thereby enhancing the understanding process. We evaluate Re2 on extensive reasoning benchmarks across 14 datasets, spanning 112 experiments, to validate its effectiveness and generality.
arXiv Detail & Related papers (2023-09-12T14:36:23Z)
ReWOO: Decoupling Reasoning from Observations for Efficient Augmented Language Models [32.95155349925248]
We propose a modular paradigm ReWOO that detaches the reasoning process from external observations, thus significantly reducing token consumption. We show that ReWOO achieves 5x token efficiency and 4% accuracy improvement on HotpotQA, a multi-step reasoning benchmark. Our illustrative work offloads reasoning ability from 175B GPT3.5 into 7B LLaMA, demonstrating the significant potential for truly efficient and scalable ALM systems.
arXiv Detail & Related papers (2023-05-23T00:16:48Z)

This list is automatically generated from the titles and abstracts of the papers in this site.