Fugu-MT 論文翻訳(概要): DebugRepair: Enhancing LLM-Based Automated Program Repair via Self-Directed Debugging

論文の概要: DebugRepair: Enhancing LLM-Based Automated Program Repair via Self-Directed Debugging

arxiv url: http://arxiv.org/abs/2604.19305v1
Date: Tue, 21 Apr 2026 10:11:59 GMT
ステータス: 翻訳完了
システム内更新日: 2026-04-22 22:41:49.719035
Title: DebugRepair: Enhancing LLM-Based Automated Program Repair via Self-Directed Debugging
Title（参考訳）: DebugRepair: 自己指向デバッグによるLLMベースの自動プログラム修復の強化
Authors: Linhao Wu, Yifei Pei, Zhen Yang, Kainan Li, Zhonghang Lu, Hao Tan, Xiran Lyu, Jia Li, Yizhou Chen, Pengyu Xue, Kunwu Zheng, Dan Hao,
Abstract要約: DebugRepairは自動プログラム修復のためのセルフダイレクトフレームワークである。中間ランタイムエビデンスによるパッチ改善を改善する。 15のアプローチに対して最先端のパフォーマンスを達成する。
参考スコア（独自算出の注目度）: 20.747648291211338
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Automated Program Repair (APR) has benefited from the code understanding and generation capabilities of Large Language Models (LLMs). Existing feedback-based APR methods iteratively refine candidate patches using test execution feedback and have shown promising results. However, most rely on outcome-level failure symptoms, such as stack traces, which show how failures are observed but fail to expose the intermediate runtime states critical for root-cause analysis. As a result, LLMs often infer bug causes without sufficient runtime evidence, leading to incorrect patches. To address this limitation, we propose DebugRepair, a self-directed debugging framework for LLM-based APR. DebugRepair enhances patch refinement with intermediate runtime evidence collected through simulated debugging. It consists of three components: test semantic purification, simulated instrumentation, and debugging-driven conversational repair. Together, they reduce noisy test context, collect runtime traces through targeted debugging statements with rule-based fallback, and progressively refine candidate patches using prior attempts and newly observed runtime states. We evaluate DebugRepair on three benchmarks across Java and Python. Experiments show that DebugRepair achieves state-of-the-art performance against 15 approaches. With GPT-3.5, it correctly fixes 224 bugs on Defects4J, outperforming prior SOTA LLM-based methods by 26.2%. With DeepSeek-V3, it correctly fixes 295 Defects4J bugs, surpassing the second-best baseline by 59 bugs. Across five additional backbone LLMs, DebugRepair improves repair performance by 51.3% over vanilla settings. Ablation studies further confirm the effectiveness of all components.
Abstract（参考訳）: APR(Automated Program repair)は、LLM(Large Language Models)のコード理解と生成能力の恩恵を受けている。既存のフィードバックベースのAPR手法では、テスト実行フィードバックを用いて候補パッチを反復的に洗練し、有望な結果を示した。しかし、ほとんどの場合、スタックトレースのような結果レベルの障害症状に依存しており、失敗がどのように観察されるかを示すが、ルート原因分析に不可欠な中間ランタイム状態を公開することができない。その結果、LLMは十分な実行時証拠のないバグ原因を推測することが多く、誤ったパッチを発生させる。この制限に対処するため、LLMベースのAPRのための自己指向デバッグフレームワークであるDebugRepairを提案する。 DebugRepairは、シミュレートされたデバッグを通じて収集された中間ランタイムエビデンスによるパッチ改善を強化する。テストセマンティックなパーフィケーション、シミュレートされたインスツルメンテーション、デバッグ駆動の会話修復の3つのコンポーネントで構成されている。同時に、ノイズの多いテストコンテキストを減らし、ルールベースのフォールバックでターゲットとするデバッグステートメントを通じてランタイムトレースを収集し、事前試行と新たに観察されたランタイムステートを使用して、候補パッチを徐々に洗練する。 JavaとPythonの3つのベンチマークでDebugRepairを評価します。実験によると、DebugRepairは15のアプローチに対して最先端のパフォーマンスを達成する。 GPT-3.5では、Defects4Jの224のバグを正しく修正し、以前のSOTA LLMベースのメソッドを26.2%上回った。 DeepSeek-V3では、295のDefects4Jバグを正しく修正し、2番目に高いベースラインを59のバグで越えている。さらに5つのバックボーンLLMで、DebugRepairはバニラ設定よりも51.3%改善されている。アブレーション研究は、全ての成分の有効性をさらに確認する。

論文の概要: DebugRepair: Enhancing LLM-Based Automated Program Repair via Self-Directed Debugging

関連論文リスト