Related papers: EPiC: Towards Lossless Speedup for Reasoning Training through Edge-Preserving CoT Condensation

EPiC: Towards Lossless Speedup for Reasoning Training through Edge-Preserving CoT Condensation

URL: http://arxiv.org/abs/2506.04205v1
Date: Wed, 04 Jun 2025 17:49:10 GMT
Title: EPiC: Towards Lossless Speedup for Reasoning Training through Edge-Preserving CoT Condensation
Authors: Jinghan Jia, Hadi Reisizadeh, Chongyu Fan, Nathalie Baracaldo, Mingyi Hong, Sijia Liu,
Abstract summary: We study the problem of CoT condensation for resource-efficient reasoning training.<n>We propose an Edge-Preserving Condensation method, EPiC, which selectively retains only the initial and final segments of each CoT trace.
Score: 37.6583581020347
License: http://creativecommons.org/licenses/by/4.0/
Abstract: Large language models (LLMs) have shown remarkable reasoning capabilities when trained with chain-of-thought (CoT) supervision. However, the long and verbose CoT traces, especially those distilled from large reasoning models (LRMs) such as DeepSeek-R1, significantly increase training costs during the distillation process, where a non-reasoning base model is taught to replicate the reasoning behavior of an LRM. In this work, we study the problem of CoT condensation for resource-efficient reasoning training, aimed at pruning intermediate reasoning steps (i.e., thoughts) in CoT traces, enabling supervised model training on length-reduced CoT data while preserving both answer accuracy and the model's ability to generate coherent reasoning. Our rationale is that CoT traces typically follow a three-stage structure: problem understanding, exploration, and solution convergence. Through empirical analysis, we find that retaining the structure of the reasoning trace, especially the early stage of problem understanding (rich in reflective cues) and the final stage of solution convergence, is sufficient to achieve lossless reasoning supervision. To this end, we propose an Edge-Preserving Condensation method, EPiC, which selectively retains only the initial and final segments of each CoT trace while discarding the middle portion. This design draws an analogy to preserving the "edge" of a reasoning trajectory, capturing both the initial problem framing and the final answer synthesis, to maintain logical continuity. Experiments across multiple model families (Qwen and LLaMA) and benchmarks show that EPiC reduces training time by over 34% while achieving lossless reasoning accuracy on MATH500, comparable to full CoT supervision. To the best of our knowledge, this is the first study to explore thought-level CoT condensation for efficient reasoning model distillation.

Related papers

Is Chain-of-Thought Reasoning of LLMs a Mirage? A Data Distribution Lens [23.326813303795692]
Chain-of-Thought (CoT) prompting has been shown to improve Large Language Model (LLM) performance on various tasks.<n>However, some initial findings suggest that CoT reasoning may be more superficial than it appears.
arXiv Detail & Related papers (2025-08-02T04:37:28Z)
SCOUT: Teaching Pre-trained Language Models to Enhance Reasoning via Flow Chain-of-Thought [37.53215651690168]
Chain of Thought (CoT) prompting improves the reasoning performance of large language models (LLMs) by encouraging step by step thinking.<n>While promising, CoT-based approaches often require costly pretraining and lack a principled framework for how reasoning should evolve.<n>We propose SCOUT, a lightweight fine tuning framework that enables Flow CoT style reasoning without the need for pretraining.
arXiv Detail & Related papers (2025-05-30T03:43:24Z)
Reinforced Latent Reasoning for LLM-based Recommendation [83.18146814163308]
Large Language Models (LLMs) have demonstrated impressive reasoning capabilities in complex problem-solving tasks.<n>Existing methods typically rely on fine-tuning with explicit chain-of-thought (CoT) data.<n>In this work, we explore an alternative approach that shifts from explicit CoT reasoning to compact, information-dense latent reasoning.
arXiv Detail & Related papers (2025-05-25T11:03:45Z)
Fractured Chain-of-Thought Reasoning [61.647243580650446]
We introduce Fractured Sampling, a unified inference-time strategy that interpolates between full CoT and solution-only sampling.<n>We show that Fractured Sampling consistently achieves superior accuracy-cost trade-offs, yielding steep log-linear scaling gains in Pass@k versus token budget.
arXiv Detail & Related papers (2025-05-19T11:30:41Z)
Deconstructing Long Chain-of-Thought: A Structured Reasoning Optimization Framework for Long CoT Distillation [22.875285119636235]
R1 distillation scheme has emerged as a promising approach for training cost-effective models with enhanced reasoning abilities.<n>This study examines the universality of distillation data and identifies key components that enable the efficient transfer of long-chain reasoning capabilities.<n>We propose DLCoT (Deconstructing Long Chain-of-Thought), a distillation data enhancement framework.
arXiv Detail & Related papers (2025-03-20T17:46:38Z)
When More is Less: Understanding Chain-of-Thought Length in LLMs [51.631483479081645]
Large Language Models (LLMs) employ Chain-of-Thought (CoT) reasoning to deconstruct complex problems.<n>This paper argues that longer CoTs are often presumed superior, arguing that longer is not always better.
arXiv Detail & Related papers (2025-02-11T05:28:59Z)
Unveiling the Mechanisms of Explicit CoT Training: How CoT Enhances Reasoning Generalization [9.191236388401226]
The integration of explicit Chain-of-Thought (CoT) reasoning into training large language models has advanced their reasoning capabilities, yet the mechanisms by which CoT enhances generalization remain poorly understood.<n>This work investigates (1) textithow CoT training reshapes internal model representations and (2) textitwhy it improves both in-distribution (ID) and out-of-distribution (OOD) reasoning generalization.
arXiv Detail & Related papers (2025-02-07T05:21:13Z)
A Hopfieldian View-based Interpretation for Chain-of-Thought Reasoning [48.51969964676017]
Chain-of-Thought (CoT) holds a significant place in augmenting the reasoning performance for large language models. We propose a Read-and-Control approach for controlling the accuracy of CoT.
arXiv Detail & Related papers (2024-06-18T04:07:13Z)
Towards Better Chain-of-Thought: A Reflection on Effectiveness and Faithfulness [17.6082037230676]
Chain-of-thought (CoT) prompting demonstrates varying performance under different reasoning tasks.<n>Previous work attempts to evaluate it but falls short in providing an in-depth analysis of patterns that influence the CoT.<n>We identify key factors that influence CoT effectiveness on performance improvement, including problem difficulty, information gain, and information flow.
arXiv Detail & Related papers (2024-05-29T09:17:46Z)
Ladder-of-Thought: Using Knowledge as Steps to Elevate Stance Detection [73.31406286956535]
We introduce the Ladder-of-Thought (LoT) for the stance detection task. LoT directs the small LMs to assimilate high-quality external knowledge, refining the intermediate rationales produced. Our empirical evaluations underscore LoT's efficacy, marking a 16% improvement over GPT-3.5 and a 10% enhancement compared to GPT-3.5 with CoT on stance detection task.
arXiv Detail & Related papers (2023-08-31T14:31:48Z)

This list is automatically generated from the titles and abstracts of the papers in this site.

This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.