Fugu-MT 論文翻訳(概要): DPO-F+: Aligning Code Repair Feedback with Developers' Preferences

論文の概要: DPO-F+: Aligning Code Repair Feedback with Developers' Preferences

arxiv url: http://arxiv.org/abs/2511.01043v1
Date: Sun, 02 Nov 2025 18:39:41 GMT
ステータス: 翻訳完了
システム内更新日: 2025-11-05 16:37:27.040327
Title: DPO-F+: Aligning Code Repair Feedback with Developers' Preferences
Title（参考訳）: DPO-F+: 開発者の選好によるコード修正フィードバックの調整
Authors: Zihan Fang, Yifan Zhang, Yueke Zhang, Kevin Leach, Yu Huang,
Abstract要約: DPO-f+は、コードレビューフィードバックを開発者のニーズやプロファイルと整合させるフレームワークです。経験的に、DPO-f+はベースラインと標準DPOの両方で、生成コード精度と全体的なフィードバックアライメントでパフォーマンスが向上する。
参考スコア（独自算出の注目度）: 13.333315604414922
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Large Language Models (LLMs) are increasingly applied to software engineering tasks, especially code repair. However, developers often struggle to interpret model outputs, limiting effective human-AI teaming. Prior work largely optimizes repaired code while under-addressing the natural-language feedback that enables comprehension and iterative improvement. We present DPO-f+, a novel framework that aligns code-repair feedback with developer needs and profiles. It (1) formalizes developer-profiled, domain-specific metrics for feedback alignment; (2) automatically constructs pairwise preference datasets from code-repair tasks; (3) fine-tunes using Direct Preference Optimization (DPO) augmented with a lightweight margin signal; and (4) provides an automated feedback evaluation protocol. Empirically, DPO-f+ outperforms both the baseline and standard DPO on generated-code accuracy and overall feedback alignment. On novice programming tasks, DPO-f+ raises the top-1 pass rate by 5.71 percentage points (pp) over the baseline and by 3.30 pp over DPO. On the more challenging SWE-bench Lite benchmark, it increases the issue-resolution rate by 1.67 pp over DPO and by 4.67 pp over the baseline. It also achieves the largest improvement in feedback alignment, outperforming DPO and the baseline. By aligning feedback more closely with developer needs, DPO-f+ turns LLM-assisted repair from one-shot outputs into a collaborative sensemaking workflow, providing a practical approach to enhancing code comprehension and fostering more effective human-AI teaming in software engineering.
Abstract（参考訳）: 大規模言語モデル(LLM)は、ソフトウェア工学のタスク、特にコード修復にますます適用されている。しかし、開発者はしばしばモデルアウトプットの解釈に苦労し、効果的なヒューマンAIチームを制限する。それまでの作業は、理解と反復的な改善を可能にする自然言語フィードバックをアンダーアドレッシングしながら、リファクタリングされたコードを主に最適化する。 DPO-f+は、コードレビューフィードバックを開発者のニーズやプロファイルと整合させる新しいフレームワークである。 1) 開発者が注目する、フィードバックアライメントのためのドメイン固有のメトリクスを形式化し、(2) コードリファクタリングタスクからペア指向の選好データセットを自動的に構築し、(3) 直接選好最適化(DPO)を軽量なマージン信号に付加した微調整を行い、(4) 自動フィードバック評価プロトコルを提供する。経験的に、DPO-f+はベースラインと標準DPOの両方で、生成コード精度と全体的なフィードバックアライメントでパフォーマンスが向上する。初心者のプログラミングタスクでは、DPO-f+はベースラインで5.71ポイント(pp)、DPOで3.30ppポイント上昇する。より難しいSWE-bench Liteベンチマークでは、DPOで1.67pp、ベースラインで4.67ppのイシューレゾリューションレートが向上する。また、フィードバックアライメントの最大の改善、DPOとベースラインを上回ります。 DPO-f+は、開発者のニーズにより緊密にフィードバックを合わせることで、LLM支援の修復をワンショットアウトプットからコラボレーティブなセンスメイキングワークフローに変え、コードの理解を高め、ソフトウェアエンジニアリングにおけるより効果的なヒューマンAIチームを促進するための実践的なアプローチを提供する。

論文の概要: DPO-F+: Aligning Code Repair Feedback with Developers' Preferences

関連論文リスト