Related papers: Reasoning Scaffolding: Distilling the Flow of Thought from LLMs

Reasoning Scaffolding: Distilling the Flow of Thought from LLMs

URL: http://arxiv.org/abs/2509.23619v2
Date: Wed, 01 Oct 2025 08:57:37 GMT
Title: Reasoning Scaffolding: Distilling the Flow of Thought from LLMs
Authors: Xiangyu Wen, Junhua Huang, Zeju Li, Min Li, Jianyuan Zhong, Zhijian Xu, Mingxuan Yuan, Yongxiang Huang, Qiang Xu,
Abstract summary: We introduce Reasoning Scaffolding, a framework that reframes reasoning as a structured generation process.<n>Our method significantly outperforms state-of-the-art distillation in both accuracy and logical consistency.
Score: 30.569464420145163
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: The prevailing approach to distilling reasoning from Large Language Models (LLMs)-behavioral cloning from textual rationales-is fundamentally limited. It teaches Small Language Models (SLMs) to mimic surface-level patterns rather than the underlying algorithmic structure of thought, resulting in a critical lack of logical robustness. We argue that instead of cloning text, distillation should transfer this algorithmic structure directly. We introduce Reasoning Scaffolding}, a framework that reframes reasoning as a structured generation process. Our method first abstracts the teacher's thought process into a sequence of discrete, interpretable semantic signals (e.g., Contrast, Addition) that act as a scaffold. The student model is then trained via a multi-task objective to both (1)predict the next semantic signal, anticipating the reasoning flow, and (2)generate the corresponding step, conditioned on that signal. This multi-task scheme acts as a powerful regularizer, compelling the student to internalize the computational patterns of coherent reasoning. On a suite of challenging reasoning benchmarks, our method significantly outperforms state-of-the-art distillation in both accuracy and logical consistency, providing a path towards creating smaller models that are genuine reasoners, not just fluent mimics.

Related papers

Structured Reasoning for Large Language Models [59.215789462977206]
We propose Structured Reasoning (SCR), a framework that decouples reasoning trajectories into explicit, evaluable, and trainable components.<n>SCR substantially improves reasoning efficiency and self-verification.<n>Compared with existing reasoning paradigms, it reduces output token length by up to 50%.
arXiv Detail & Related papers (2026-01-12T04:04:01Z)
From Hypothesis to Premises: LLM-based Backward Logical Reasoning with Selective Symbolic Translation [8.104087344683604]
We propose a novel framework, Hypothesis-driven Backward Logical Reasoning (HBLR)<n>The core idea is to integrate confidence-aware symbolic translation with hypothesis-driven backward reasoning.<n>HBLR consistently outperforms strong baselines in both accuracy and efficiency.
arXiv Detail & Related papers (2025-12-03T01:52:31Z)
Step-Aware Policy Optimization for Reasoning in Diffusion Large Language Models [57.42778606399764]
Diffusion language models (dLLMs) offer a promising, non-autoregressive paradigm for text generation.<n>Current reinforcement learning approaches often rely on sparse, outcome-based rewards.<n>We argue that this stems from a fundamental mismatch with the natural structure of reasoning.
arXiv Detail & Related papers (2025-10-02T00:34:15Z)
Beyond Scaling Law: A Data-Efficient Distillation Framework for Reasoning [10.186434946738201]
Large language models (LLMs) demonstrate remarkable reasoning capabilities in tasks such as algorithmic coding and mathematical problem-solving.<n>Recent methods have improved reasoning through expanded corpus and multistage training combining reinforcement learning and supervised fine-tuning.
arXiv Detail & Related papers (2025-08-13T15:32:25Z)
Skip-Thinking: Chunk-wise Chain-of-Thought Distillation Enable Smaller Language Models to Reason Better and Faster [51.89995713333108]
Chain-of-thought (CoT) distillation allows a large language model (LLM) to guide a small language model (SLM) in reasoning tasks.<n>Existing methods train the SLM to learn the long rationale in one iteration.<n>We propose chunk-wise training (CWT), which uses a search to divide the rationale into internal semantically coherent chunks.
arXiv Detail & Related papers (2025-05-24T11:04:52Z)
BRiTE: Bootstrapping Reinforced Thinking Process to Enhance Language Model Reasoning [78.63421517563056]
Large Language Models (LLMs) have demonstrated remarkable capabilities in complex reasoning tasks.<n>We present a unified probabilistic framework that formalizes LLM reasoning through a novel graphical model.<n>We introduce the Bootstrapping Reinforced Thinking Process (BRiTE) algorithm, which works in two steps.
arXiv Detail & Related papers (2025-01-31T02:39:07Z)
On the Diagram of Thought [20.805936414171892]
Large Language Models (LLMs) excel at many tasks but often falter on complex problems that require structured, multi-step reasoning.<n>We introduce the Diagram of Thought (DoT), a new framework that enables a single LLM to build and navigate a mental map of its reasoning.
arXiv Detail & Related papers (2024-09-16T07:01:41Z)
Distilling Reasoning Ability from Large Language Models with Adaptive Thinking [54.047761094420174]
Chain of thought finetuning (cot-finetuning) aims to endow small language models (SLM) with reasoning ability to improve their performance towards specific tasks.<n>Most existing cot-finetuning methods adopt a pre-thinking mechanism, allowing the SLM to generate a rationale before providing an answer.<n>This mechanism enables SLM to analyze and think about complex questions, but it also makes answer correctness highly sensitive to minor errors in rationale.<n>We propose a robust post-thinking mechanism to generate answers before rationale.
arXiv Detail & Related papers (2024-04-14T07:19:27Z)
Large Language Models as an Indirect Reasoner: Contrapositive and Contradiction for Automated Reasoning [74.90592233107712]
We propose a Direct-Indirect Reasoning (DIR) method, which considers Direct Reasoning (DR) and Indirect Reasoning (IR) as multiple parallel reasoning paths that are merged to derive the final answer.<n>Our DIR method is simple yet effective and can be straightforwardly integrated with existing variants of CoT methods.
arXiv Detail & Related papers (2024-02-06T03:41:12Z)
Are LLMs Rigorous Logical Reasoner? Empowering Natural Language Proof Generation with Contrastive Stepwise Decoding [10.421832675327712]
We introduce contrastive decoding to stepwise proof generation, making use of negative reasoning paths to strengthen the model's capacity for logical deduction.<n> Experiments on EntailmentBank underscore the success of our method in augmenting the proof planning abilities of language models.
arXiv Detail & Related papers (2023-11-12T05:12:49Z)
Implicit Chain of Thought Reasoning via Knowledge Distillation [58.80851216530288]
Instead of explicitly producing the chain of thought reasoning steps, we use the language model's internal hidden states to perform implicit reasoning. We find that this approach enables solving tasks previously not solvable without explicit chain-of-thought, at a speed comparable to no chain-of-thought.
arXiv Detail & Related papers (2023-11-02T17:59:49Z)
Deductive Verification of Chain-of-Thought Reasoning [22.79166959432764]
Large Language Models (LLMs) benefit from Chain-of-Thought prompting in performing various reasoning tasks. While CoT allows models to produce more comprehensive reasoning processes, its emphasis on intermediate reasoning steps can inadvertently introduce hallucinations and accumulated errors. We propose Natural Program, a natural language-based deductive reasoning format.
arXiv Detail & Related papers (2023-06-06T17:18:56Z)

This list is automatically generated from the titles and abstracts of the papers in this site.