Code Execution as Grounded Supervision for LLM Reasoning
- URL: http://arxiv.org/abs/2506.10343v1
- Date: Thu, 12 Jun 2025 04:36:57 GMT
- Title: Code Execution as Grounded Supervision for LLM Reasoning
- Authors: Dongwon Jung, Wenxuan Zhou, Muhao Chen,
- Abstract summary: Training large language models with chain-of-thought (CoT) supervision has proven effective for enhancing their reasoning abilities.<n>We propose a scalable method for generating a high-quality CoT supervision dataset by leveraging the determinism of program execution.<n>Our approach extracts verifiable, step-by-step reasoning traces from code execution and transforms them into a natural language CoT reasoning.
- Score: 36.97199200274124
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Training large language models (LLMs) with chain-of-thought (CoT) supervision has proven effective for enhancing their reasoning abilities. However, obtaining reliable and accurate reasoning supervision remains a significant challenge. We propose a scalable method for generating a high-quality CoT supervision dataset by leveraging the determinism of program execution. Unlike existing reasoning dataset generation methods that rely on costly human annotations or error-prone LLM-generated CoT, our approach extracts verifiable, step-by-step reasoning traces from code execution and transforms them into a natural language CoT reasoning. Experiments on reasoning benchmarks across various domains show that our method effectively equips LLMs with transferable reasoning abilities across diverse tasks. Furthermore, the ablation studies validate that our method produces highly accurate reasoning data and reduces overall token length during inference by reducing meaningless repetition and overthinking.
Related papers
- Revisiting LLM Reasoning via Information Bottleneck [57.519119962528166]
Large language models (LLMs) have recently demonstrated remarkable progress in reasoning capabilities through reinforcement learning with verifiable rewards (RLVR)<n>We present a theoretical characterization of LLM reasoning grounded in information bottleneck (IB) principle.<n>We propose IB-aware reasoning optimization (IBRO), a framework that encourages reasoning trajectories to be both informative about the final correct answer and generalizable.
arXiv Detail & Related papers (2025-07-24T13:14:25Z) - Stop Overthinking: A Survey on Efficient Reasoning for Large Language Models [54.04678363287392]
Large Language Models (LLMs) have demonstrated remarkable capabilities in complex tasks.<n>Recent advancements in OpenAI o1 and DeepSeek-R1 have further improved performance in System-2 reasoning domains.
arXiv Detail & Related papers (2025-03-20T17:59:38Z) - Uncertainty-Guided Chain-of-Thought for Code Generation with LLMs [45.33160999781074]
Chain-of-Thought (CoT) reasoning has been demonstrated as an effective technique for improving the problem-solving capabilities of large language models (LLMs)<n>We introduce UnCert-CoT, an approach designed to enhance code generation by incorporating an uncertainty-aware CoT reasoning mechanism.
arXiv Detail & Related papers (2025-03-19T15:40:45Z) - "Well, Keep Thinking": Enhancing LLM Reasoning with Adaptive Injection Decoding [4.008780119020479]
Large language models (LLMs) exhibit strong reasoning abilities, often attributed to few-shot or zero-shot chain-of-thought (CoT) prompting.<n>We propose a novel decoding strategy that systematically nudges LLMs to continue reasoning, thereby preventing immature reasoning processes.
arXiv Detail & Related papers (2025-03-13T08:46:32Z) - The First Few Tokens Are All You Need: An Efficient and Effective Unsupervised Prefix Fine-Tuning Method for Reasoning Models [69.798277882245]
We introduce Unsupervised Prefix Fine-Tuning (UPFT) to enhance large language models' reasoning efficiency.<n>UPFT removes the need for labeled data or exhaustive sampling.<n> Experiments show that UPFT matches the performance of supervised methods.
arXiv Detail & Related papers (2025-03-04T18:56:03Z) - SoftCoT: Soft Chain-of-Thought for Efficient Reasoning with LLMs [48.28847964704554]
Chain-of-Thought (CoT) reasoning enables Large Language Models (LLMs) to solve complex reasoning tasks.<n>We propose a novel approach for continuous-space reasoning that does not require modifying the LLM.
arXiv Detail & Related papers (2025-02-17T18:52:29Z) - Understanding Chain-of-Thought in LLMs through Information Theory [16.78730663293352]
We formalize Chain-of-Thought (CoT) reasoning in Large Language Models (LLMs) through an information-theoretic lens.
Specifically, our framework quantifies the information gain' at each reasoning step, enabling the identification of failure modes.
We demonstrate the efficacy of our approach through extensive experiments on toy and GSM-8K data, where it significantly outperforms existing outcome-based methods.
arXiv Detail & Related papers (2024-11-18T19:14:36Z) - Improving LLM Reasoning through Scaling Inference Computation with Collaborative Verification [52.095460362197336]
Large language models (LLMs) struggle with consistent and accurate reasoning.
LLMs are trained primarily on correct solutions, reducing their ability to detect and learn from errors.
We propose a novel collaborative method integrating Chain-of-Thought (CoT) and Program-of-Thought (PoT) solutions for verification.
arXiv Detail & Related papers (2024-10-05T05:21:48Z) - Large Language Models as an Indirect Reasoner: Contrapositive and Contradiction for Automated Reasoning [74.90592233107712]
We propose a Direct-Indirect Reasoning (DIR) method, which considers Direct Reasoning (DR) and Indirect Reasoning (IR) as multiple parallel reasoning paths that are merged to derive the final answer.<n>Our DIR method is simple yet effective and can be straightforwardly integrated with existing variants of CoT methods.
arXiv Detail & Related papers (2024-02-06T03:41:12Z) - Are LLMs Rigorous Logical Reasoner? Empowering Natural Language Proof Generation with Contrastive Stepwise Decoding [10.421832675327712]
We introduce contrastive decoding to stepwise proof generation, making use of negative reasoning paths to strengthen the model's capacity for logical deduction.<n> Experiments on EntailmentBank underscore the success of our method in augmenting the proof planning abilities of language models.
arXiv Detail & Related papers (2023-11-12T05:12:49Z) - Enhancing Chain-of-Thoughts Prompting with Iterative Bootstrapping in Large Language Models [81.01397924280612]
Large language models (LLMs) can achieve highly effective performance on various reasoning tasks by incorporating step-by-step chain-of-thought (CoT) prompting as demonstrations.
We introduce Iter-CoT (Iterative bootstrapping in Chain-of-Thoughts Prompting), an iterative bootstrapping approach for selecting exemplars and generating reasoning chains.
arXiv Detail & Related papers (2023-04-23T13:54:39Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.