SEER: Enhancing Chain-of-Thought Code Generation through Self-Exploring Deep Reasoning
- URL: http://arxiv.org/abs/2510.17130v1
- Date: Mon, 20 Oct 2025 03:51:17 GMT
- Title: SEER: Enhancing Chain-of-Thought Code Generation through Self-Exploring Deep Reasoning
- Authors: Shuzheng Gao, Chaozheng Wang, Cuiyun Gao, Michael R. Lyu,
- Abstract summary: Chain-of-Thought (CoT) reasoning enables Large Language Models (LLMs) to develop high-level reasoning plans before writing code.<n>We present SEER, a SElf-Exploring deep Reasoning framework that enables accurate and adaptive reasoning for code generation.
- Score: 41.76790935791852
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Code generation, the task of creating executable programs from natural language requirements, has recently seen tremendous advances through Chain-of-Thought (CoT) reasoning, which enables Large Language Models (LLMs) to develop high-level reasoning plans before writing code. Recent research has proposed various methods to enhance models' CoT reasoning for code generation such as prompt engineering and supervised fine-tuning. However, existing approaches still face three critical limitations: (1) limited exploration of diverse reasoning paths, which constrains generalization across various programming scenarios, (2) lack of quality assessment for intermediate reasoning steps, which hampers the reliability of the generated plans and code, and (3) the potential negative impact of "overthinking", potentially leading to unnecessarily complex and incorrect solutions. To address these limitations, we frame CoT code generation as a decision making problem and present SEER, a SElf-Exploring deep Reasoning framework that enables accurate and adaptive reasoning for code generation. SEER introduces three key components: (1) Diverse reasoning path exploration, which aims at exploring diverse reasoning paths and annotating intermediate steps without relying on manual experts or closed-source proprietary models; (2) Reasoning quality-aware model training, which trains a policy model for generating candidate reasoning steps and a value model for assessing their quality; and (3) Adaptive CoT reasoning, which dynamically switches between direct generation and step-by-step reasoning for different problems.
Related papers
- Learning Structured Reasoning via Tractable Trajectory Control [99.75278337895024]
Ctrl-R is a framework for learning structured reasoning via tractable trajectory control.<n>We show that Ctrl-R enables effective exploration and internalization of previously unattainable reasoning patterns.
arXiv Detail & Related papers (2026-03-02T09:18:19Z) - LogitsCoder: Towards Efficient Chain-of-Thought Path Search via Logits Preference Decoding for Code Generation [86.08600027874662]
We propose LogitsCoder, a novel framework that enhances chain-of-thought reasoning through lightweight, logit-level control mechanisms for code generation.<n>We show that LogitsCoder produces more efficient and higher-quality reasoning chains, leading to superior code generation performance compared to baseline methods.
arXiv Detail & Related papers (2026-02-15T08:52:19Z) - CODE: A Contradiction-Based Deliberation Extension Framework for Overthinking Attacks on Retrieval-Augmented Generation [43.85448261466922]
We propose an end-to-end attack framework named Contradiction-Based Deliberation Extension (CODE)<n> CODE develops a multi-agent architecture to construct poisoning samples that are injected into the knowledge base.<n>Experiments show that CODE causes a 5.32x-24.72x increase in reasoning token consumption, without degrading task performance.
arXiv Detail & Related papers (2026-01-19T14:52:31Z) - Neural Chain-of-Thought Search: Searching the Optimal Reasoning Path to Enhance Large Language Models [61.55758048622473]
We introduce Neural Chain-of-Thought Search (NCoTS), a framework that reformulates reasoning as a dynamic search for the optimal thinking strategy.<n>By quantitatively characterizing the solution space, we reveal the existence of sparse superior reasoning paths that are simultaneously more accurate and concise than standard outputs.
arXiv Detail & Related papers (2026-01-16T14:38:18Z) - Multi-Path Collaborative Reasoning via Reinforcement Learning [54.8518809800168]
Chain-of-Thought (CoT) reasoning has significantly advanced the problem-solving capabilities of Large Language Models (LLMs)<n>Recent methods attempt to address this by generating soft abstract tokens to enable reasoning in a continuous semantic space.<n>We propose Multi-Path Perception Policy Optimization (M3PO), a novel reinforcement learning framework that explicitly injects collective insights into the reasoning process.
arXiv Detail & Related papers (2025-12-01T10:05:46Z) - Scaling Code-Assisted Chain-of-Thoughts and Instructions for Model Reasoning [65.20602712957725]
Caco is a novel framework that automates the synthesis of high-quality, verifiable, and diverse instruction-CoT reasoning data.<n>Our work establishes a paradigm for building self-sustaining, trustworthy reasoning systems without human intervention.
arXiv Detail & Related papers (2025-10-05T07:59:24Z) - Step-Aware Policy Optimization for Reasoning in Diffusion Large Language Models [57.42778606399764]
Diffusion language models (dLLMs) offer a promising, non-autoregressive paradigm for text generation.<n>Current reinforcement learning approaches often rely on sparse, outcome-based rewards.<n>We argue that this stems from a fundamental mismatch with the natural structure of reasoning.
arXiv Detail & Related papers (2025-10-02T00:34:15Z) - Refining Critical Thinking in LLM Code Generation: A Faulty Premise-based Evaluation Framework [11.47215537484756]
This paper proposes FPBench, the first code generation evaluation framework targeting faulty premises.<n>Most models exhibit poor reasoning abilities and suboptimal code generation performance under faulty premises.
arXiv Detail & Related papers (2025-08-05T16:39:39Z) - MindOmni: Unleashing Reasoning Generation in Vision Language Models with RGPO [87.52631406241456]
Recent text-to-image systems face limitations in handling multimodal inputs and complex reasoning tasks.<n>We introduce Mind Omni, a unified multimodal large language model that addresses these challenges by incorporating reasoning generation through reinforcement learning.
arXiv Detail & Related papers (2025-05-19T12:17:04Z) - Meta-Reasoner: Dynamic Guidance for Optimized Inference-time Reasoning in Large Language Models [35.82665698868508]
Large Language Models (LLMs) struggle with high computational time and error propagation during inference time.<n>We propose Meta-Reasoner, a new framework to enable LLMs to optimize the inference compute by adjusting strategies on how to reason during inference time.<n>Our method improves performance by 9-12% over previous SOTA methods while reducing inference time by 28-35%.
arXiv Detail & Related papers (2025-02-27T09:40:13Z) - Large Language Models Meet Symbolic Provers for Logical Reasoning Evaluation [24.081573908824353]
First-order logic (FOL) reasoning is pivotal for intelligent systems.<n>Existing benchmarks often rely on extensive human annotation or handcrafted templates.<n>We propose a novel framework called ProverGen that synergizes the generative strengths of Large Language Models with the rigor and precision of symbolic provers.
arXiv Detail & Related papers (2025-02-10T15:31:54Z) - Unlocking the Capabilities of Thought: A Reasoning Boundary Framework to Quantify and Optimize Chain-of-Thought [61.588465852846646]
Chain-of-Thought (CoT) reasoning has emerged as a promising approach for enhancing the performance of large language models (LLMs)
In this work, we introduce a novel reasoning boundary framework (RBF) to address these challenges.
arXiv Detail & Related papers (2024-10-08T05:26:28Z) - Unlocking Reasoning Potential in Large Langauge Models by Scaling Code-form Planning [94.76546523689113]
We introduce CodePlan, a framework that generates and follows textcode-form plans -- pseudocode that outlines high-level, structured reasoning processes.
CodePlan effectively captures the rich semantics and control flows inherent to sophisticated reasoning tasks.
It achieves a 25.1% relative improvement compared with directly generating responses.
arXiv Detail & Related papers (2024-09-19T04:13:58Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.