Related papers: Reasoning Pattern Matters: Learning to Reason without Human Rationales

Reasoning Pattern Matters: Learning to Reason without Human Rationales

URL: http://arxiv.org/abs/2510.12643v1
Date: Tue, 14 Oct 2025 15:34:38 GMT
Title: Reasoning Pattern Matters: Learning to Reason without Human Rationales
Authors: Chaoxu Pang, Yixuan Cao, Ping Luo,
Abstract summary: Large Language Models (LLMs) have demonstrated remarkable reasoning capabilities under the widely adopted SFT+RLVR paradigm.<n>This paper investigates when and how rationale annotation costs can be substantially reduced without compromising reasoning performance.
Score: 27.684703630371043
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Large Language Models (LLMs) have demonstrated remarkable reasoning capabilities under the widely adopted SFT+RLVR paradigm, which first performs Supervised Fine-Tuning (SFT) on human-annotated reasoning trajectories (rationales) to establish initial reasoning behaviors, then applies Reinforcement Learning with Verifiable Rewards (RLVR) to optimize the model using verifiable signals without golden rationales. However, annotating high-quality rationales for the SFT stage remains prohibitively expensive. This paper investigates when and how rationale annotation costs can be substantially reduced without compromising reasoning performance. We identify a broad class of problems, termed patterned reasoning tasks, where reasoning follows a fixed, procedural strategy consistent across instances. Although instances vary in content such as domain knowledge, factual information, or numeric values, the solution derives from applying a shared reasoning pattern. We argue that the success of SFT+RLVR on such tasks primarily stems from its ability to enable models to internalize these reasoning patterns. Using numerical semantic matching as a representative task, we provide both causal and behavioral evidence showing that reasoning patterns rather than the quantity or quality of rationales are the key determinant of performance. Building on these insights, we propose Pattern-Aware LLMs as Rationale AnnOtators (PARO), a simple yet effective framework that enables LLMs to generate rationales aligned with task-specific reasoning patterns without requiring human rationale annotations. Experiments show that PARO-generated rationales achieve comparable SFT+RLVR performance to human rationales that are 10 times larger. These results suggest that large-scale human rationale annotations can be replaced with LLM-based automatic annotations requiring only limited human supervision over reasoning patterns.

Related papers

Are More Tokens Rational? Inference-Time Scaling in Language Models as Adaptive Resource Rationality [1.5994376682356057]
We introduce a Variable Attribution Task in which models infer which variables determine outcomes given candidate variables, input-output trials, and predefined logical functions.<n>Both models exhibit a transition from brute-force to analytic strategies as complexity increases.<n>These findings suggest that models can adjust their reasoning behavior in response to task complexity, even without explicit cost-based reward.
arXiv Detail & Related papers (2026-02-10T22:07:05Z)
Reinforced Efficient Reasoning via Semantically Diverse Exploration [73.41112984160992]
Reinforcement learning with verifiable rewards (RLVR) has proven effective in enhancing the reasoning of large language models (LLMs)<n>We propose reinforced efficient reasoning via semantically diverse explorations, i.e., ROSE, for LLMs.<n>Our method incorporates a semantic-entropy-based branching strategy and an $varepsilon$-exploration mechanism.
arXiv Detail & Related papers (2026-01-08T15:56:44Z)
Making Mathematical Reasoning Adaptive [61.45161826629692]
We propose the AdaR framework to enable adaptive reasoning in large language models (LLMs)<n>AdaR synthesizes logically equivalent queries by varying variable values, and trains models with RLVR on these data to penalize spurious logic.<n> Experimental results demonstrate that AdaR improves robustness and generalization, achieving substantial improvement in mathematical reasoning.
arXiv Detail & Related papers (2025-10-06T09:30:05Z)
Enhancing Logical Reasoning in Language Models via Symbolically-Guided Monte Carlo Process Supervision [43.05159920832912]
Large language models (LLMs) have shown strong performance in many reasoning benchmarks.<n>LLMs are susceptible to content variations, demonstrating a lack of robust planning or symbolic abstractions.<n>We propose to overcome such limitations by synthesizing high-quality symbolic reasoning trajectories with stepwise pseudo-labels.
arXiv Detail & Related papers (2025-05-26T18:06:39Z)
Do Theory of Mind Benchmarks Need Explicit Human-like Reasoning in Language Models? [14.29992535286614]
Theory of Mind (ToM) is the ability to attribute mental states to others.<n>Recent advancements in Large Language Models have shown promising performance on ToM benchmarks.<n>Do these benchmarks necessitate explicit human-like reasoning processes, or can models succeed through alternative strategies?
arXiv Detail & Related papers (2025-04-02T12:58:42Z)
Stop Overthinking: A Survey on Efficient Reasoning for Large Language Models [49.61246073215651]
Large Language Models (LLMs) have demonstrated remarkable capabilities in complex tasks.<n>Recent advancements in OpenAI o1 and DeepSeek-R1 have further improved performance in System-2 reasoning domains.<n>However, they also introduce significant computational overhead due to verbose and redundant outputs.
arXiv Detail & Related papers (2025-03-20T17:59:38Z)
P-FOLIO: Evaluating and Improving Logical Reasoning with Abundant Human-Written Reasoning Chains [97.25943550933829]
We present P-FOLIO, a human-annotated dataset consisting of diverse and complex reasoning chains. We use P-FOLIO to evaluate and improve large-language-model (LLM) reasoning capabilities.
arXiv Detail & Related papers (2024-10-11T19:22:57Z)
RATIONALYST: Mining Implicit Rationales for Process Supervision of Reasoning [41.9992614617405]
We introduce RATIONALYST, a model for process-supervision of reasoning based on pre-training.<n>We extract 79k rationales from web-scale unlabelled dataset (the Pile) and a combination of reasoning datasets with minimal human intervention.<n>Fine-tuned from LLaMa-3-8B, RATIONALYST improves the accuracy of reasoning by an average of 3.9% on 7 representative reasoning benchmarks.
arXiv Detail & Related papers (2024-10-01T20:05:51Z)
LogicAsker: Evaluating and Improving the Logical Reasoning Ability of Large Language Models [63.14196038655506]
We introduce LogicAsker, a novel approach for evaluating and enhancing the logical reasoning capabilities of large language models (LLMs) Our methodology reveals significant gaps in LLMs' learning of logical rules, with identified reasoning failures ranging from 29% to 90% across different models. We leverage these findings to construct targeted demonstration examples and fine-tune data, notably enhancing logical reasoning in models like GPT-4o by up to 5%.
arXiv Detail & Related papers (2024-01-01T13:53:53Z)
Towards LogiGLUE: A Brief Survey and A Benchmark for Analyzing Logical Reasoning Capabilities of Language Models [56.34029644009297]
Large language models (LLMs) have demonstrated the ability to overcome various limitations of formal Knowledge Representation (KR) systems. LLMs excel most in abductive reasoning, followed by deductive reasoning, while they are least effective at inductive reasoning. We study single-task training, multi-task training, and "chain-of-thought" knowledge distillation fine-tuning technique to assess the performance of model.
arXiv Detail & Related papers (2023-10-02T01:00:50Z)

This list is automatically generated from the titles and abstracts of the papers in this site.

This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.