Related papers: Strategy Executability in Mathematical Reasoning: Leveraging Human-Model Differences for Effective Guidance

Strategy Executability in Mathematical Reasoning: Leveraging Human-Model Differences for Effective Guidance

URL: http://arxiv.org/abs/2602.22583v1
Date: Thu, 26 Feb 2026 03:34:23 GMT
Title: Strategy Executability in Mathematical Reasoning: Leveraging Human-Model Differences for Effective Guidance
Authors: Weida Liang, Yiyou Sun, Shuyuan Nan, Chuang Li, Dawn Song, Kenji Kawaguchi,
Abstract summary: We show a previously underexplored gap between strategy usage and strategy executability.<n>We propose Selective Strategy Retrieval (SSR), a test-time framework that explicitly models executability.<n> SSR yields reliable and consistent improvements over direct solving, in-context learning, and single-source guidance.
Score: 86.46794021499511
License: http://creativecommons.org/licenses/by/4.0/
Abstract: Example-based guidance is widely used to improve mathematical reasoning at inference time, yet its effectiveness is highly unstable across problems and models-even when the guidance is correct and problem-relevant. We show that this instability arises from a previously underexplored gap between strategy usage-whether a reasoning strategy appears in successful solutions-and strategy executability-whether the strategy remains effective when instantiated as guidance for a target model. Through a controlled analysis of paired human-written and model-generated solutions, we identify a systematic dissociation between usage and executability: human- and model-derived strategies differ in structured, domain-dependent ways, leading to complementary strengths and consistent source-dependent reversals under guidance. Building on this diagnosis, we propose Selective Strategy Retrieval (SSR), a test-time framework that explicitly models executability by selectively retrieving and combining strategies using empirical, multi-route, source-aware signals. Across multiple mathematical reasoning benchmarks, SSR yields reliable and consistent improvements over direct solving, in-context learning, and single-source guidance, improving accuracy by up to $+13$ points on AIME25 and $+5$ points on Apex for compact reasoning models. Code and benchmark are publicly available at: https://github.com/lwd17/strategy-execute-pipeline.

Related papers

Experience-Guided Adaptation of Inference-Time Reasoning Strategies [49.954515048847874]
Experience-Guided Reasoner (EGuR) generates tailored strategies at inference time based on accumulated experience.<n>EGuR achieves up to 14% accuracy improvements over the strongest baselines while reducing computational costs by up to 111x.
arXiv Detail & Related papers (2025-11-14T17:45:28Z)
SMaRT: Select, Mix, and ReinvenT - A Strategy Fusion Framework for LLM-Driven Reasoning and Planning [14.78684546475325]
Large Language Models (LLMs) have redefined complex task automation with exceptional generalization capabilities.<n>No single strategy excels universally, highlighting the need for frameworks that fuse strategies to maximize performance and ensure robustness.<n>We introduce the Select, Mix, and ReinvenT (SMaRT) framework, an innovative strategy fusion approach designed to overcome this constraint.
arXiv Detail & Related papers (2025-10-20T20:42:24Z)
Plan before Solving: Problem-Aware Strategy Routing for Mathematical Reasoning with LLMs [49.995906301946]
Existing methods usually leverage a fixed strategy to guide Large Language Models (LLMs) to perform mathematical reasoning.<n>Our analysis reveals that the single strategy cannot adapt to problem-specific requirements and thus overlooks the trade-off between effectiveness and efficiency.<n>We propose Planning and Routing through Instance-Specific Modeling (PRISM), a novel framework that decouples mathematical reasoning into two stages: strategy planning and targeted execution.
arXiv Detail & Related papers (2025-09-29T07:22:41Z)
Mitigating Strategy-Selection Bias in Reasoning for More Effective Test-Time Scaling [27.616118519120366]
Test-time scaling (TTS) has been shown to improve the performance of large language models (LLMs) by sampling and aggregating diverse reasoning paths.<n>We present a theoretical analysis that reveals when it undermines the effectiveness of test-time scaling.<n>Motivated by this theoretical insight, we introduce TTS-Uniform, a framework designed to mitigate the selection bias of reasoning strategies.
arXiv Detail & Related papers (2025-09-22T15:30:56Z)
Route to Reason: Adaptive Routing for LLM and Reasoning Strategy Selection [7.045509749924679]
Route-To-Reason (RTR) is a novel unified routing framework that dynamically allocates both LMs and reasoning strategies according to task difficulty under budget constraints.<n>RTR learns compressed representations of both expert models and reasoning strategies, enabling their joint and adaptive selection at inference time.
arXiv Detail & Related papers (2025-05-26T02:53:17Z)
The CoT Encyclopedia: Analyzing, Predicting, and Controlling how a Reasoning Model will Think [81.38614558541772]
We introduce the CoT Encyclopedia, a framework for analyzing and steering model reasoning.<n>Our method automatically extracts diverse reasoning criteria from model-generated CoTs.<n>We show that this framework produces more interpretable and comprehensive analyses than existing methods.
arXiv Detail & Related papers (2025-05-15T11:31:02Z)
Strategic Chain-of-Thought: Guiding Accurate Reasoning in LLMs through Strategy Elicitation [16.350747493026432]
The Chain-of-Thought (CoT) paradigm has emerged as a critical approach for enhancing the reasoning capabilities of large language models (LLMs) We propose the textbfStrategic Chain-of-Thought (SCoT) to refine LLM performance by integrating strategic knowledge prior to generating intermediate reasoning steps. SCoT employs a two-stage approach within a single prompt: first eliciting an effective problem-solving strategy, which is then used to guide the generation of high-quality CoT paths and final answers.
arXiv Detail & Related papers (2024-09-05T06:28:05Z)
REX: Rapid Exploration and eXploitation for AI Agents [103.68453326880456]
We propose an enhanced approach for Rapid Exploration and eXploitation for AI Agents called REX. REX introduces an additional layer of rewards and integrates concepts similar to Upper Confidence Bound (UCB) scores, leading to more robust and efficient AI agent performance.
arXiv Detail & Related papers (2023-07-18T04:26:33Z)
Exploiting Temporal Structures of Cyclostationary Signals for Data-Driven Single-Channel Source Separation [98.95383921866096]
We study the problem of single-channel source separation (SCSS) We focus on cyclostationary signals, which are particularly suitable in a variety of application domains. We propose a deep learning approach using a U-Net architecture, which is competitive with the minimum MSE estimator.
arXiv Detail & Related papers (2022-08-22T14:04:56Z)

This list is automatically generated from the titles and abstracts of the papers in this site.

This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.