Related papers: Outcome-Conditioned Reasoning Distillation for Resolving Software Issues

Outcome-Conditioned Reasoning Distillation for Resolving Software Issues

URL: http://arxiv.org/abs/2601.23257v1
Date: Fri, 30 Jan 2026 18:25:39 GMT
Title: Outcome-Conditioned Reasoning Distillation for Resolving Software Issues
Authors: Chenglin Li, Yisen Xu, Zehao Wang, Shin Hwei Tan, Tse-Hsun, Chen,
Abstract summary: We present an Outcome-Conditioned Reasoning Distillation(O-CRD) framework that uses resolved in-repository issues with verified patches as supervision.<n>Starting from a historical fix, the method reconstructs a stage-wise repair trace backward from the verified outcome.<n>On SWE-Bench Lite, this approach increases Pass@1 by 10.4% with GPT-4o, 8.6% with DeepSeek-V3, and 10.3% with GPT-5.
Score: 49.16055123488827
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Software issue resolution in large repositories is a long-range decision process: choices made during localization shape the space of viable edits, and missteps can compound into incorrect patches. Despite this, many LLM-based repair pipelines still operate in a reset-and-solve manner, producing fresh reasoning for every new issue instead of carrying forward what worked in past fixes. This is wasteful because repositories routinely contain earlier issues with overlapping structure, failure modes, or constraints, where prior repair experience could provide useful guidance. Existing approaches typically harvest this signal through forward-time trial procedures, such as repeated refinement or search, incurring high inference cost while still risking divergence from the eventual correct patch. We present an Outcome-Conditioned Reasoning Distillation(O-CRD) framework that uses resolved in-repository issues with verified patches as supervision. Starting from a historical fix, the method reconstructs a stage-wise repair trace backward from the verified outcome, then reuses the distilled guidance at inference time to steer file/function localization and patch synthesis, without fine-tuning or online search. On SWE-Bench Lite, this approach increases Pass@1 by 10.4% with GPT-4o, 8.6% with DeepSeek-V3, and 10.3% with GPT-5, indicating that outcome-conditioned reuse of verified repairs can replace costly forward exploration for software issue resolution.

Related papers

RepoRepair: Leveraging Code Documentation for Repository-Level Automated Program Repair [30.23781155493087]
We propose RepoRepair, a novel documentation-enhanced approach for repository-level fault localization and program repair.<n>Our core insight is to leverage LLMs to generate hierarchical code documentation (from functions to files) for code repositories.<n>RepoRepair first employs a text-based LLM to generate file/function-level code documentation for repositories, which serves as auxiliary knowledge to guide fault localization.
arXiv Detail & Related papers (2026-03-01T11:06:24Z)
Pull Requests as a Training Signal for Repo-Level Code Editing [49.82435173554125]
Clean Pull Request (Clean-PR) is a mid-training paradigm that leverages real-world GitHub pull requests as a training signal for repository-level editing.<n>We introduce a scalable pipeline that converts noisy pull request diffs into Search/Replace edit blocks through reconstruction and validation.<n>On SWE-bench, our model significantly outperforms the instruction-tuned baseline, achieving absolute improvements of 13.6% on SWE-bench Lite and 12.3% on SWE-bench Verified.
arXiv Detail & Related papers (2026-02-07T09:22:25Z)
TraceCoder: A Trace-Driven Multi-Agent Framework for Automated Debugging of LLM-Generated Code [11.207330722400764]
We present TraceCoder, a framework that emulates the observe-analyze-repair process of human experts.<n>The framework first instruments the code with diagnostic probes to capture fine-grained runtime traces.<n>It then conducts causal analysis on these traces to accurately identify the root cause of the failure.
arXiv Detail & Related papers (2026-02-06T16:59:48Z)
R3A: Reliable RTL Repair Framework with Multi-Agent Fault Localization and Stochastic Tree-of-Thoughts Patch Generation [3.5576449247822506]
We propose R3A, an automatic program repair framework upon the basic model to improve reliability.<n>Experiments show R3A can fix 90.6% of bugs in the RTL-repair dataset within a given time limit.
arXiv Detail & Related papers (2025-11-25T09:08:48Z)
REFINE: Enhancing Program Repair Agents through Context-Aware Patch Refinement [12.995571513415905]
Large Language Models (LLMs) have recently shown strong potential in automatic program repair (APR)<n>LLMs often struggle to produce correct fixes due to limited understanding of code context and over-reliance on incomplete test suites.<n>We propose a novel patch refinement framework, Refine, that systematically transforms Draft Patches into correct ones.
arXiv Detail & Related papers (2025-10-04T00:34:32Z)
Specification-Guided Repair of Arithmetic Errors in Dafny Programs using LLMs [79.74676890436174]
We present an APR tool for Dafny that uses formal specifications as oracles for fault localization and repair.<n>We localize faults through a series of steps, which include using Hoare logic to determine the state of each statement within the program.<n>Our tool achieves 89.6% fault localization coverage and GPT-4o mini yields the highest repair success rate of 74.18%.
arXiv Detail & Related papers (2025-07-04T15:36:12Z)
Repeton: Structured Bug Repair with ReAct-Guided Patch-and-Test Cycles [1.387448620257867]
Large Language Models (LLMs) have shown strong capabilities in code generation and comprehension, yet their application to complex software engineering tasks often suffers from low precision and limited interpretability.<n>We present Repeton, a fully open-source framework that leverages LLMs for precise and automated code manipulation in real-world Git.
arXiv Detail & Related papers (2025-06-09T19:36:40Z)
LLMs as Continuous Learners: Improving the Reproduction of Defective Code in Software Issues [62.12404317786005]
EvoCoder is a continuous learning framework for issue code reproduction. Our results show a 20% improvement in issue reproduction rates over existing SOTA methods.
arXiv Detail & Related papers (2024-11-21T08:49:23Z)
RAP-Gen: Retrieval-Augmented Patch Generation with CodeT5 for Automatic Program Repair [75.40584530380589]
We propose a novel Retrieval-Augmented Patch Generation framework (RAP-Gen) RAP-Gen explicitly leveraging relevant fix patterns retrieved from a list of previous bug-fix pairs. We evaluate RAP-Gen on three benchmarks in two programming languages, including the TFix benchmark in JavaScript, and Code Refinement and Defects4J benchmarks in Java.
arXiv Detail & Related papers (2023-09-12T08:52:56Z)
Patch Space Exploration using Static Analysis Feedback [8.13782364161157]
We show how to automatically repair memory safety issues, by leveraging static analysis to guide repair. Our proposed approach learns what a desirable patch is by inspecting how close a patch is to fixing the bug. We make repair scalable by creating classes of equivalent patches according to the effect they have on the symbolic heap, and then invoking the validation oracle only once per class of patch equivalence.
arXiv Detail & Related papers (2023-08-01T05:22:10Z)
RCOT: Detecting and Rectifying Factual Inconsistency in Reasoning by Reversing Chain-of-Thought [56.558892336235914]
Reversing Chain-of-Thought (RCoT) is a novel method to improve large language models' reasoning abilities. RCoT automatically detects and rectifys factual inconsistency in generated solutions. We show that manually written fine-grained feedback can dramatically improve LLMs' reasoning abilities.
arXiv Detail & Related papers (2023-05-19T08:02:52Z)

This list is automatically generated from the titles and abstracts of the papers in this site.