Step-Tagging: Toward controlling the generation of Language Reasoning Models through step monitoring
- URL: http://arxiv.org/abs/2512.14332v1
- Date: Tue, 16 Dec 2025 12:01:16 GMT
- Title: Step-Tagging: Toward controlling the generation of Language Reasoning Models through step monitoring
- Authors: Yannis Belkhiter, Seshu Tirupathi, Giulio Zizzo, John D. Kelleher,
- Abstract summary: A growing body of studies show that Language Reasoning Models (LRMs) are still inefficient, over-generating verification and reflection steps.<n>We introduce the Step-Tagging framework, a lightweight sentence-classifier enabling real-time annotation of the type of reasoning steps that an LRM is generating.<n>Online monitoring of the count of specific steps can produce effective interpretable early stopping criteria of LRM inferences.
- Score: 5.190961793309368
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: The field of Language Reasoning Models (LRMs) has been very active over the past few years with advances in training and inference techniques enabling LRMs to reason longer, and more accurately. However, a growing body of studies show that LRMs are still inefficient, over-generating verification and reflection steps. To address this challenge, we introduce the Step-Tagging framework, a lightweight sentence-classifier enabling real-time annotation of the type of reasoning steps that an LRM is generating. To monitor reasoning behaviors, we introduced ReasonType: a novel taxonomy of reasoning steps. Building on this framework, we demonstrated that online monitoring of the count of specific steps can produce effective interpretable early stopping criteria of LRM inferences. We evaluate the Step-tagging framework on three open-source reasoning models across standard benchmark datasets: MATH500, GSM8K, AIME and non-mathematical tasks (GPQA and MMLU-Pro). We achieve 20 to 50\% token reduction while maintaining comparable accuracy to standard generation, with largest gains observed on more computation-heavy tasks. This work offers a novel way to increase control over the generation of LRMs, and a new tool to study behaviors of LRMs.
Related papers
- TaTToo: Tool-Grounded Thinking PRM for Test-Time Scaling in Tabular Reasoning [77.01182934427095]
TaTToo is a novel table-grounded PRM framework that integrates tool-based verification to provide precise reward supervision.<n>We train TaTToo with a dual-stage paradigm: cold-start supervised fine-tuning to capture tool-use reasoning patterns, followed by reinforcement learning to align our model with table-based verification.
arXiv Detail & Related papers (2025-10-07T17:59:41Z) - CALM Before the STORM: Unlocking Native Reasoning for Optimization Modeling [60.55856973678002]
Large Reasoning Models (LRMs) have demonstrated strong capabilities in complex multi-step reasoning.<n>Existing domain adaptation methods, originally designed for earlier instruction-tuned models, often fail to exploit the advanced reasoning patterns of modern LRMs.<n>We propose textbfCALM, a framework that progressively refines LRMs within their native reasoning modes for optimization modeling tasks.
arXiv Detail & Related papers (2025-10-05T13:38:31Z) - A Study on Thinking Patterns of Large Reasoning Models in Code Generation [14.138043269602074]
Large language models (LLMs) are utilized for software engineering tasks such as code generation.<n>This paper presents a comprehensive study aimed at investigating and uncovering the reasoning behavior of LRMs during code generation.<n>We derive a taxonomy of LRM reasoning behaviors, encompassing 15 reasoning actions across four phases.
arXiv Detail & Related papers (2025-09-17T07:13:12Z) - Model Utility Law: Evaluating LLMs beyond Performance through Mechanism Interpretable Metric [99.56567010306807]
Large Language Models (LLMs) have become indispensable across academia, industry, and daily applications.<n>One core challenge of evaluation in the large language model (LLM) era is the generalization issue.<n>We propose Model Utilization Index (MUI), a mechanism interpretability enhanced metric that complements traditional performance scores.
arXiv Detail & Related papers (2025-04-10T04:09:47Z) - GenPRM: Scaling Test-Time Compute of Process Reward Models via Generative Reasoning [35.429904556288996]
We introduce GenPRM, a generative process reward model that performs explicit Chain-of-Thought (CoT) reasoning with code verification.<n> Experimental results show that GenPRM significantly outperforms prior PRMs with only 23K training data from MATH dataset.
arXiv Detail & Related papers (2025-04-01T15:21:05Z) - Exploring the Necessity of Reasoning in LLM-based Agent Scenarios [74.35956310688164]
We propose the LaRMA framework, encompassing nine tasks across Tool Usage, Plan Design, and Problem Solving.<n>Our findings address four research questions: LRMs surpass LLMs in reasoning-intensive tasks like Plan Design, leveraging iterative reflection for superior outcomes.<n>LRMs' enhanced reasoning incurs higher computational costs, prolonged processing, and behavioral challenges, including overthinking and fact-ignoring tendencies.
arXiv Detail & Related papers (2025-03-14T04:34:31Z) - Graph-Augmented Reasoning: Evolving Step-by-Step Knowledge Graph Retrieval for LLM Reasoning [55.6623318085391]
Recent large language model (LLM) reasoning suffers from limited domain knowledge, susceptibility to hallucinations, and constrained reasoning depth.<n>This paper presents the first investigation into integrating step-wise knowledge graph retrieval with step-wise reasoning.<n>We propose KG-RAR, a framework centered on process-oriented knowledge graph construction, a hierarchical retrieval strategy, and a universal post-retrieval processing and reward model.
arXiv Detail & Related papers (2025-03-03T15:20:41Z) - AdaptiveStep: Automatically Dividing Reasoning Step through Model Confidence [29.551802573731305]
We propose AdaptiveStep, a method that divides reasoning steps based on the model's confidence in predicting the next word.<n>We demonstrate its effectiveness through experiments with AdaptiveStep-trained PRMs in mathematical reasoning and code generation tasks.
arXiv Detail & Related papers (2025-02-19T18:35:55Z) - Let's reward step by step: Step-Level reward model as the Navigators for
Reasoning [64.27898739929734]
Process-Supervised Reward Model (PRM) furnishes LLMs with step-by-step feedback during the training phase.
We propose a greedy search algorithm that employs the step-level feedback from PRM to optimize the reasoning pathways explored by LLMs.
To explore the versatility of our approach, we develop a novel method to automatically generate step-level reward dataset for coding tasks and observed similar improved performance in the code generation tasks.
arXiv Detail & Related papers (2023-10-16T05:21:50Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.