Related papers: Explore-Execute Chain: Towards an Efficient Structured Reasoning Paradigm

Explore-Execute Chain: Towards an Efficient Structured Reasoning Paradigm

URL: http://arxiv.org/abs/2509.23946v2
Date: Tue, 30 Sep 2025 02:45:38 GMT
Title: Explore-Execute Chain: Towards an Efficient Structured Reasoning Paradigm
Authors: Kaisen Yang, Lixuan He, Rushi Shah, Kaicheng Yang, Qinwei Ma, Dianbo Liu, Alex Lamb,
Abstract summary: Chain-of-Thought (CoT) and its variants have markedly advanced the reasoning abilities of Large Language Models (LLMs)<n>We propose the Explore-Execute Chain ($E2C$), a structured reasoning framework that decouples reasoning into two distinct phases.
Score: 8.405729585427226
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Chain-of-Thought (CoT) and its variants have markedly advanced the reasoning abilities of Large Language Models (LLMs), yet their monolithic and auto-regressive architecture inherently conflates high-level strategic planning with low-level step-by-step execution, leading to computational inefficiency, limited exploration of reasoning paths, and reduced interpretability. To overcome these issues, we propose the Explore-Execute Chain ($E^2C$), a structured reasoning framework that decouples reasoning into two distinct phases: an exploratory phase that stochastically generates succinct high-level plans, followed by an execution phase that deterministically carries out the chosen plan. Our approach incorporates a two-stage training methodology, which combines Supervised Fine-Tuning (SFT) - augmented by a novel data generation algorithm enforcing strict plan adherence - with a subsequent Reinforcement Learning (RL) stage that capitalizes on the informativeness of exploration and reinforces the determinism of execution. This decomposition enables an efficient test-time scaling strategy: on AIME'2024, $E^2C$ Test Time Scaling reaches 58.1% accuracy using <10% of the decoding tokens required by comparable methods (e.g., Forest-of-Thought), sharply cutting self-consistency overhead. For cross-domain adaptation, our Exploration-Focused SFT (EF-SFT) fine-tunes with only 3.5% of the tokens used by standard SFT yet yields up to 14.5% higher accuracy than standard SFT on medical benchmarks, delivering state-of-the-art performance, strong generalization, and greater interpretability by separating planning from execution. The code and pre-trained models for the project are available at: https://github.com/yks23/Explore-Execute-Chain.git

Related papers

Arbitrage: Efficient Reasoning via Advantage-Aware Speculation [71.45710345765528]
Speculative Decoding accelerates inference by employing a fast but inaccurate draft model to autoregressively propose tokens.<n>But due to unnecessary rejections caused by token mismatches in semantically equivalent steps, traditional token-level Speculative Decoding struggles in reasoning tasks.<n>We propose Arbitrage, a novel step-level speculative generation framework that routes generation dynamically based on the relative advantage between draft and target models.
arXiv Detail & Related papers (2025-12-04T17:50:53Z)
Principled RL for Diffusion LLMs Emerges from a Sequence-Level Perspective [85.06838178922791]
Reinforcement Learning (RL) has proven highly effective for autoregressive language models.<n>But adapting these methods to diffusion large language models (dLLMs) presents fundamental challenges.<n>We propose a principled RL framework that treats entire sequence generation as a single action and uses the ELBO as a tractable sequence-level likelihood proxy.
arXiv Detail & Related papers (2025-12-03T13:05:32Z)
Efficient Thought Space Exploration through Strategic Intervention [54.35208611253168]
We propose a novel Hint-Practice Reasoning (HPR) framework that operationalizes this insight through two synergistic components.<n>The framework's core innovation lies in Distributional Inconsistency Reduction (DIR), which dynamically identifies intervention points.<n> Experiments across arithmetic and commonsense reasoning benchmarks demonstrate HPR's state-of-the-art efficiency-accuracy tradeoffs.
arXiv Detail & Related papers (2025-11-13T07:26:01Z)
Enhancing Long Chain-of-Thought Reasoning through Multi-Path Plan Aggregation [32.86351316550696]
We analyze raw long CoTs and uncover a reasoning hierarchy consisting of planning and execution steps.<n>Motivated by this observation, we propose Multi-Path Plan Aggregation (MPPA), a framework that augments single-pass reasoning with plan exploration and aggregation.<n>To overcome this, we introduce online Step-DPO, a process-level preference optimization scheme that leverages Twisted Sequential Monte Carlo (TSMC) to provide scalable stepwise supervision.
arXiv Detail & Related papers (2025-10-13T17:02:41Z)
Plan Then Action:High-Level Planning Guidance Reinforcement Learning for LLM Reasoning [22.177866778776814]
We propose a two-stage framework designed to improve both high-level planning and fine-grained Chain-of-Thought (CoT) reasoning.<n>In the first stage, we leverage advanced LLMs to distill CoT into compact high-level guidance, which is then used for supervised fine-tuning.<n>In the second stage, we introduce a guidance-aware RL method that jointly optimize the final output and the quality of high-level guidance.
arXiv Detail & Related papers (2025-10-02T09:28:13Z)
Adaptive Test-Time Reasoning via Reward-Guided Dual-Phase Search [62.1546099504045]
We propose a dual-phase test-time scaling framework that separates reasoning into planning and execution.<n>Specifically, we decompose reasoning trajectories and develop reward models for each phase, enabling the search to explore and prune plans and executions separately.<n> Experiments on both mathematical reasoning and code generation benchmarks demonstrate that our approach consistently improves accuracy while reducing computation redundant.
arXiv Detail & Related papers (2025-09-29T19:27:23Z)
EconProver: Towards More Economical Test-Time Scaling for Automated Theorem Proving [64.15371139980802]
Large Language Models (LLMs) have recently advanced the field of Automated Theorem Proving (ATP)<n>We show that different test-time scaling strategies for ATP models introduce significant computational overhead for inference.<n>We propose two complementary methods that can be integrated into a unified EconRL pipeline for amplified benefits.
arXiv Detail & Related papers (2025-09-16T03:00:13Z)
READER: Retrieval-Assisted Drafter for Efficient LLM Inference [0.0386965802948046]
Autoregressive Language Models instantiate a factorized likelihood over token sequences, yet their strictly sequential decoding process imposes an intrinsic lower bound on latency inference.<n>This bottleneck has emerged as a central obstacle to the scalable deployment of large-scale generative models.<n>We present READER, a speculative decoding framework that bypasses the training of the auxiliary draft model.
arXiv Detail & Related papers (2025-08-12T16:47:48Z)
Pruning the Unsurprising: Efficient Code Reasoning via First-Token Surprisal [13.035073453917088]
Large Reasoning Models (LRMs) have demonstrated remarkable capabilities in code reasoning by scaling up the length of Chain-of-Thought (CoT)<n>We propose ASAP (Anchor-guided, Surprisal-based Pruning), a novel coarse-to-fine framework for CoT compression.<n> ASAP achieves state-of-the-art accuracy across multiple code generation benchmarks while substantially reducing training and inference costs.
arXiv Detail & Related papers (2025-08-08T03:46:21Z)
KAT-V1: Kwai-AutoThink Technical Report [50.84483585850113]
We present Kwaipilot-AutoThink (KAT), an open-source 40B large language model developed to address the overthinking problem in reasoning-intensive tasks.<n>KAT dynamically switches between reasoning and non-reasoning modes based on task complexity.<n>We also propose Step-SRPO, a reinforcement learning algorithm that incorporates intermediate supervision into the GRPO framework.
arXiv Detail & Related papers (2025-07-11T04:07:10Z)
S$^4$C: Speculative Sampling with Syntactic and Semantic Coherence for Efficient Inference of Large Language Models [38.784951111677856]
Large language models (LLMs) exhibit remarkable reasoning capabilities across diverse downstream tasks.<n>Their autoregressive nature leads to substantial latency inference, posing challenges for real-time applications.<n>We propose a Speculative Sampling with Syntactic and Semantic Coherence framework, which extends speculative sampling by leveraging multi-head drafting.
arXiv Detail & Related papers (2025-06-17T03:38:19Z)
A*-Decoding: Token-Efficient Inference Scaling [0.0]
Inference-time scaling has emerged as a powerful alternative to parameter scaling for improving language model performance.<n>We introduce A*-decoding, a search-based inference-time strategy that builds on the A* search algorithm to optimally utilize a fixed compute budget.<n>Our work demonstrates how thoughtful inference-time strategies can enhance reasoning in SLMs, pointing toward future advances in more efficient and scalable language model deployment.
arXiv Detail & Related papers (2025-05-19T19:19:48Z)
Fractured Chain-of-Thought Reasoning [61.647243580650446]
We introduce Fractured Sampling, a unified inference-time strategy that interpolates between full CoT and solution-only sampling.<n>We show that Fractured Sampling consistently achieves superior accuracy-cost trade-offs, yielding steep log-linear scaling gains in Pass@k versus token budget.
arXiv Detail & Related papers (2025-05-19T11:30:41Z)
Ladder-of-Thought: Using Knowledge as Steps to Elevate Stance Detection [73.31406286956535]
We introduce the Ladder-of-Thought (LoT) for the stance detection task. LoT directs the small LMs to assimilate high-quality external knowledge, refining the intermediate rationales produced. Our empirical evaluations underscore LoT's efficacy, marking a 16% improvement over GPT-3.5 and a 10% enhancement compared to GPT-3.5 with CoT on stance detection task.
arXiv Detail & Related papers (2023-08-31T14:31:48Z)

This list is automatically generated from the titles and abstracts of the papers in this site.