Fugu-MT 論文翻訳(概要): On the Effectiveness of Code Representation in Deep Learning-Based Automated Patch Correctness Assessment

論文の概要: On the Effectiveness of Code Representation in Deep Learning-Based Automated Patch Correctness Assessment

arxiv url: http://arxiv.org/abs/2603.07520v1
Date: Sun, 08 Mar 2026 08:18:42 GMT
ステータス: 翻訳完了
システム内更新日: 2026-03-10 15:13:14.709442
Title: On the Effectiveness of Code Representation in Deep Learning-Based Automated Patch Correctness Assessment
Title（参考訳）: ディープラーニングに基づく自動パッチ精度評価におけるコード表現の有効性について
Authors: Quanjun Zhang, Chunrong Fang, Haichuan Hu, Yuan Zhao, Weisong Sun, Yun Yang, Tao Zheng, Zhenyu Chen,
Abstract要約: 自動プログラム修復(APR)は、正しいパッチを作成しようとする試みであり、過去数十年間、学術と産業の両方から広く注目を集めてきた。オーバーフィッティング問題に対処するため、コミュニティはパッチの正当性(APCAアプローチ)を予測するためのアプローチを多数提案している。中でも,デザインの自動マッチングを目的とした局所的な深層学習アプローチが強くなってきている。パッチの正確性には根本的な理由があるにもかかわらず、コード表現は体系的に研究されていない。
参考スコア（独自算出の注目度）: 27.074607600022315
License: http://creativecommons.org/licenses/by/4.0/
Abstract: Automated program repair (APR) attempts to generate correct patches and has drawn wide attention from both academia and industry in the past decades. However, APR is continuously struggling with the patch overfitting issue due to the weak test suites. Thus, to address the overfitting problem, the community has proposed an increasing number of approaches to predict patch correctness (APCA approaches). Among them, locally deep learning approaches aimed at automatically match designs has been emerging strongly. Such approaches typically encode input code snippets into well-designed representations and build a binary model for correctness prediction. Despite being fundamental in reason about patch correctness, code representation has not been systematically investigated. To bridge this gap, we perform the first extensive study to evaluate the performance of different code representations on predicting patch correctness from more than 500 trained APCA models. The experimental results on 15 benchmarks with four categories and 11 classifiers show that the graph-based code representation which is ill-explored in the literature, consistently outperforms other representations, e.g., an average accuracy of 82.6% for CPG across three GNN models. Moreover, we demonstrate that such representations can achieve comparable or better performance for three different previous APCA approaches, e.g., filtering out 87.09% overfitting patches by TREETRAIN with AST. We further find that integrating sequence-based representation into heuristic-based representation is able to yield an average improvement of 13.5% on five metrics. Overall, our study highlights the potential and challenges of utilizing code representation to reason about patch correctness, thus increasing the usability of off-the-shelf APR tools and reducing the manual debugging effort of developers in practice.
Abstract（参考訳）: 自動プログラム修復(APR)は、正しいパッチを作成しようとする試みであり、過去数十年間、学術と産業の両方から広く注目を集めてきた。しかし、APRは、弱いテストスイートのためにパッチ過適合の問題に悩まされ続けています。このように、オーバーフィッティング問題に対処するため、コミュニティはパッチの正当性(APCA)を予測するためのアプローチを多数提案している。中でも,デザインの自動マッチングを目的とした局所的な深層学習アプローチが強くなってきている。このようなアプローチは通常、入力コードスニペットをよく設計された表現にエンコードし、正当性予測のためのバイナリモデルを構築する。パッチの正確性には根本的な理由があるにもかかわらず、コード表現は体系的に研究されていない。このギャップを埋めるために、500以上のトレーニング済みAPCAモデルからパッチの正当性を予測し、異なるコード表現の性能を評価するための、最初の広範な研究を行った。 4つのカテゴリと11の分類器を持つ15のベンチマーク実験の結果、グラフベースのコード表現は3つのGNNモデルの平均精度82.6%の他の表現よりも一貫して優れていた。さらに、これらの表現は、TREETRAINとASTによる87.09%のオーバーフィッティングパッチをフィルタリングする3つの以前のAPCAアプローチに対して、同等またはより良いパフォーマンスを達成できることを実証する。さらに、シーケンスベースの表現をヒューリスティックベースの表現に統合することで、5つのメトリクスで平均13.5%の改善が得られることがわかった。全体として、本研究では、パッチの正当性を推論するためにコード表現を利用する可能性と課題を強調し、既製のAPRツールの使用性を高め、実践中の開発者の手作業によるデバッグ作業を減らす。

論文の概要: On the Effectiveness of Code Representation in Deep Learning-Based Automated Patch Correctness Assessment

関連論文リスト