Related papers: TRIM: Hybrid Inference via Targeted Stepwise Routing in Multi-Step Reasoning Tasks

TRIM: Hybrid Inference via Targeted Stepwise Routing in Multi-Step Reasoning Tasks

URL: http://arxiv.org/abs/2601.10245v1
Date: Thu, 15 Jan 2026 10:06:06 GMT
Title: TRIM: Hybrid Inference via Targeted Stepwise Routing in Multi-Step Reasoning Tasks
Authors: Vansh Kapoor, Aman Gupta, Hao Chen, Anurag Beniwal, Jing Huang, Aviral Kumar,
Abstract summary: Current methods assign entire queries to one model, treating all reasoning as equal to one model.<n>We propose a new model that handles all multi-step reasoning tasks.<n>We develop several strategies within ranging from a simple threshold to more expressive routing policies.
Score: 26.198066761026297
License: http://creativecommons.org/licenses/by/4.0/
Abstract: Multi-step reasoning tasks like mathematical problem solving are vulnerable to cascading failures, where a single incorrect step leads to complete solution breakdown. Current LLM routing methods assign entire queries to one model, treating all reasoning steps as equal. We propose TRIM (Targeted routing in multi-step reasoning tasks), which routes only critical steps$\unicode{x2013}$those likely to derail the solution$\unicode{x2013}$to larger models while letting smaller models handle routine continuations. Our key insight is that targeted step-level interventions can fundamentally transform inference efficiency by confining expensive calls to precisely those steps where stronger models prevent cascading errors. TRIM operates at the step-level: it uses process reward models to identify erroneous steps and makes routing decisions based on step-level uncertainty and budget constraints. We develop several routing strategies within TRIM, ranging from a simple threshold-based policy to more expressive policies that reason about long-horizon accuracy-cost trade-offs and uncertainty in step-level correctness estimates. On MATH-500, even the simplest thresholding strategy surpasses prior routing methods with 5x higher cost efficiency, while more advanced policies match the strong, expensive model's performance using 80% fewer expensive model tokens. On harder benchmarks such as AIME, TRIM achieves up to 6x higher cost efficiency. All methods generalize effectively across math reasoning tasks, demonstrating that step-level difficulty represents fundamental characteristics of reasoning.

Related papers

Budget-Aware Agentic Routing via Boundary-Guided Training [24.0709108941881]
Budget-Aware Agentic Routing selects between a cheap and an expensive model at each step to optimize the cost-success frontier.<n> Boundary-Guided Training builds a difficulty taxonomy to anchor learning under sparse rewards.<n>Experiment results show that our method improves the efficiency frontier, matching strong routing baselines at substantially lower cost.
arXiv Detail & Related papers (2026-02-04T07:39:27Z)
CONCUR: A Framework for Continual Constrained and Unconstrained Routing [79.85419373937765]
AI tasks differ in complexity and are best addressed with different computation strategies.<n>Most prior methods build the routing framework by training a single model across all strategies.<n>We propose CONCUR, a continual routing framework that supports both constrained and unconstrained routing.
arXiv Detail & Related papers (2025-12-10T07:30:13Z)
Arbitrage: Efficient Reasoning via Advantage-Aware Speculation [71.45710345765528]
Speculative Decoding accelerates inference by employing a fast but inaccurate draft model to autoregressively propose tokens.<n>But due to unnecessary rejections caused by token mismatches in semantically equivalent steps, traditional token-level Speculative Decoding struggles in reasoning tasks.<n>We propose Arbitrage, a novel step-level speculative generation framework that routes generation dynamically based on the relative advantage between draft and target models.
arXiv Detail & Related papers (2025-12-04T17:50:53Z)
Enhancing Long Chain-of-Thought Reasoning through Multi-Path Plan Aggregation [32.86351316550696]
We analyze raw long CoTs and uncover a reasoning hierarchy consisting of planning and execution steps.<n>Motivated by this observation, we propose Multi-Path Plan Aggregation (MPPA), a framework that augments single-pass reasoning with plan exploration and aggregation.<n>To overcome this, we introduce online Step-DPO, a process-level preference optimization scheme that leverages Twisted Sequential Monte Carlo (TSMC) to provide scalable stepwise supervision.
arXiv Detail & Related papers (2025-10-13T17:02:41Z)
SATER: A Self-Aware and Token-Efficient Approach to Routing and Cascading [39.20076289493037]
We introduce SATER, a dual-mode compatible approach that fine-tunes models through shortest-response preference optimization and a confidence-aware rejection mechanism.<n> SATER significantly reduces redundant outputs and response times, while improving both the performance of pre-generation routing and the efficiency of cascade routing.
arXiv Detail & Related papers (2025-10-04T19:55:36Z)
Route-and-Reason: Scaling Large Language Model Reasoning with Reinforced Model Router [9.580226379350737]
Multi-step reasoning has proven essential for enhancing the problem-solving capabilities of Large Language Models.<n>Yet, many reasoning steps are relatively simple and can be handled by more efficient smaller-scale language models.<n>We propose R2-Reasoner, a novel framework that enables collaborative reasoning across heterogeneous LLMs.
arXiv Detail & Related papers (2025-06-06T09:18:56Z)
Route to Reason: Adaptive Routing for LLM and Reasoning Strategy Selection [7.045509749924679]
Route-To-Reason (RTR) is a novel unified routing framework that dynamically allocates both LMs and reasoning strategies according to task difficulty under budget constraints.<n>RTR learns compressed representations of both expert models and reasoning strategies, enabling their joint and adaptive selection at inference time.
arXiv Detail & Related papers (2025-05-26T02:53:17Z)
PATS: Process-Level Adaptive Thinking Mode Switching [53.53401063490537]
Current large-language models (LLMs) typically adopt a fixed reasoning strategy, either simple or complex, for all questions, regardless of their difficulty.<n>This neglect of variation in task and reasoning process complexity leads to an imbalance between performance and efficiency.<n>Existing methods attempt to implement training-free fast-slow thinking system switching to handle problems of varying difficulty, but are limited by coarse-grained solution-level strategy adjustments.<n>We propose a novel reasoning paradigm: Process-Level Adaptive Thinking Mode Switching (PATS), which enables LLMs to dynamically adjust their reasoning strategy based on the difficulty of each step, optimizing the balance between
arXiv Detail & Related papers (2025-05-25T17:58:50Z)
Stepwise Perplexity-Guided Refinement for Efficient Chain-of-Thought Reasoning in Large Language Models [56.37421741507468]
Chain-of-Thought (CoT) reasoning has significantly enhanced the performance of large language models (LLMs)<n>We propose a method to identify critical reasoning steps using perplexity as a measure of their importance.
arXiv Detail & Related papers (2025-02-18T20:04:51Z)
BoostStep: Boosting mathematical capability of Large Language Models via improved single-step reasoning [83.03531832811386]
BoostStep is a method that enhances reasoning accuracy through step-aligned ICL examples.<n>It integrates seamlessly with chain-of-thought (CoT) and tree search algorithms.<n>It improves DeepSeek-R1-671B's performance on AIME by 2.2%, leveraging simple examples only from the MATH dataset.
arXiv Detail & Related papers (2025-01-06T18:59:13Z)
Evaluating and Improving Tool-Augmented Computation-Intensive Math Reasoning [75.74103236299477]
Chain-of-thought prompting(CoT) and tool augmentation have been validated as effective practices for improving large language models. We propose a new approach that can deliberate the reasoning steps with tool interfaces, namely textbfDELI. Experimental results on CARP and six other datasets show that the proposed DELI mostly outperforms competitive baselines.
arXiv Detail & Related papers (2023-06-04T17:02:59Z)
Self-Evaluation Guided Beam Search for Reasoning [61.523627290397556]
We introduce a stepwise self-evaluation mechanism to guide and calibrate the reasoning process of Large Language Model (LLM) We propose a decoding algorithm integrating the self-evaluation guidance via beam search. Our approach surpasses the corresponding Codex-backboned baselines in few-shot accuracy by $6.34%$, $9.56%$, and $5.46%$ on the GSM8K, AQuA, and StrategyQA.
arXiv Detail & Related papers (2023-05-01T02:37:59Z)

This list is automatically generated from the titles and abstracts of the papers in this site.