Fugu-MT 論文翻訳(概要): AP2O: Correcting LLM-Generated Code Errors Type by Type Like Humans via Adaptive Progressive Preference Optimization

論文の概要: AP2O: Correcting LLM-Generated Code Errors Type by Type Like Humans via Adaptive Progressive Preference Optimization

arxiv url: http://arxiv.org/abs/2510.02393v1
Date: Wed, 01 Oct 2025 03:17:08 GMT
ステータス: 翻訳完了
システム内更新日: 2025-10-06 16:35:52.076426
Title: AP2O: Correcting LLM-Generated Code Errors Type by Type Like Humans via Adaptive Progressive Preference Optimization
Title（参考訳）: AP2O:Adaptive Progressive Preference Optimizationによるタイプライクな人によるLLM生成コードエラーの修正
Authors: Jianqing Zhang, Wei Xia, Hande Dong, Qiang Lin, Jian Cao,
Abstract要約: 本稿では,LLMを適応的かつ体系的にガイドし,コード生成のためのコードエラーを低減する手法であるAP2O-Coderを提案する。広範な実験を通じて、私たちのAP2O-Coderは、好みの少ないデータを使用しながら、pass@kでコード生成性能を最大3%改善します。
参考スコア（独自算出の注目度）: 14.132986699859131
License: http://creativecommons.org/licenses/by-nc-nd/4.0/
Abstract: LLMs' code generation capabilities have yielded substantial improvements in the effectiveness of programming tasks. However, LLM-generated code still suffers from compilation and runtime errors. Existing offline preference optimization methods primarily focus on enhancing LLMs' coding abilities using pass/fail signals in the preference data, overlooking the deep-level error types in the failed codes. To address this, we propose Adaptively Progressive Preference Optimization (AP2O) for coding (i.e., AP2O-Coder), a method that guides LLMs adaptively and methodically to reduce code errors for code generation. Specifically, we construct an error notebook from failed codes and progressively optimize the LLM to correct errors type by type. Furthermore, we adaptively replay error types to tailor to the LLM's changing weaknesses throughout the training process. Through extensive experiments on both code and general LLMs (Llama, Qwen, and DeepSeek series) with parameters ranging from 0.5B to 34B, our AP2O-Coder improves code generation performance by up to 3% in pass@k while using less preference data. Code: https://github.com/TsingZ0/AP2O
Abstract（参考訳）: LLMのコード生成能力は、プログラミングタスクの有効性を大幅に改善した。しかし、LLMの生成したコードは依然としてコンパイルと実行時のエラーに悩まされている。既存のオフライン優先最適化手法は、優先データ中のパス/フェイル信号を用いて、失敗するコードの深層エラータイプを見渡すことで、LLMの符号化能力の向上に重点を置いている。そこで本研究では,LLMを適応的かつ方法論的に誘導し,コード生成におけるコードエラーを低減する手法である,符号化のための適応進行性優先最適化(AP2O)を提案する。具体的には、故障したコードからエラーノートを作成し、LLMを段階的に最適化し、型別エラータイプを補正する。さらに,学習過程を通じてLLMの弱さの変化に合わせて,エラータイプを適応的にリプレイする。 0.5Bから34Bまでのパラメータを持つコードと一般的なLLM(Llama、Qwen、DeepSeekシリーズ)の広範な実験を通じて、私たちのAP2O-Coderは、好みの少ないデータを使用しながら、pass@kでコード生成性能を最大3%改善します。コード:https://github.com/TsingZ0/AP2O

論文の概要: AP2O: Correcting LLM-Generated Code Errors Type by Type Like Humans via Adaptive Progressive Preference Optimization

関連論文リスト