Fugu-MT 論文翻訳(概要): An Iterative Test-and-Repair Framework for Competitive Code Generation

論文の概要: An Iterative Test-and-Repair Framework for Competitive Code Generation

arxiv url: http://arxiv.org/abs/2604.05560v1
Date: Tue, 07 Apr 2026 08:00:54 GMT
ステータス: 翻訳完了
システム内更新日: 2026-04-08 17:42:09.708188
Title: An Iterative Test-and-Repair Framework for Competitive Code Generation
Title（参考訳）: 競合コード生成のための反復的テスト・アンド・リペアフレームワーク
Authors: Lingxiao Tang, Muyang Ye, Zhaoyang Chu, Xiaoxue Ren, Zhongxin Liu, Lingfeng Bao, He Ye,
Abstract要約: 大規模言語モデル(LLM)はコード生成において顕著な進歩を遂げているが、競争力のあるプログラミングは依然として課題である。近年,強化学習(RL)と実行フィードバックによるコード生成が改良されている。より最近のフレームワークCUREでは、テスト生成をトレーニングプロセスに組み込んで、CoderとTesterを1つのモデルで共同でトレーニングしている。
参考スコア（独自算出の注目度）: 9.137158235106943
License: http://creativecommons.org/licenses/by/4.0/
Abstract: Large language models (LLMs) have made remarkable progress in code generation, but competitive programming remains a challenge. Recent training-based methods have improved code generation by using reinforcement learning (RL) with execution feedback. The more recent framework CURE further incorporates test generation into the training process, jointly training a Coder and a Tester within a single model. At inference time, the Coder generates many candidate programs, and the Tester generates tests from the problem description. The candidate who passes the most of the generated tests is selected as the final answer. However, CURE has two critical limitations. First, the Tester never reads any candidate code, so its tests often fail to expose implementation-specific bugs. Second, the Coder generates every candidate from scratch and never learns to fix a buggy program based on a failed test. To address these limitations, we propose FixAudit, which approaches competitive code generation from a new perspective: starting from a single initial candidate, it iteratively improves the candidate through a targeted test-and-repair debugging cycle. The framework trains one shared model with two specialized roles through four stages: the Fixer, which repairs the current candidate based on a failing test, and the Auditor, which reads the candidate code to generate new tests that expose its remaining bugs. We evaluate FixAudit on three benchmarks: APPS, CodeContests, and xCodeEval. Applied to a 7B model, the framework surpasses the average performance of the larger 32B baseline within the same model family under the zero-shot setting. Compared to strong baselines built on the same 7B base model, FixAudit improves average Pass@1 by 35.1% to 36.8% and average AvgPassRatio by 7.1% to 24.5%.
Abstract（参考訳）: 大規模言語モデル(LLM)はコード生成において顕著な進歩を遂げているが、競争力のあるプログラミングは依然として課題である。近年,強化学習(RL)と実行フィードバックによるコード生成が改良されている。より最近のフレームワークCUREでは、テスト生成をトレーニングプロセスに組み込んで、CoderとTesterを1つのモデルで共同でトレーニングしている。推論時に、Coderは多くの候補プログラムを生成し、テスタは問題記述からテストを生成する。生成したテストの大部分をパスした候補が最終回答として選択される。しかし、CUREには2つの限界がある。まず、テスターは決して候補コードを読み込まないので、そのテストは実装固有のバグを公開するのに失敗することが多い。第二に、Coderはスクラッチからすべての候補を生成し、失敗したテストに基づいてバグのあるプログラムを修正することを決して学ばない。これらの制限に対処するために、新しい視点から競合コード生成にアプローチするFixAuditを提案する。フレームワークは、失敗するテストに基づいて現在の候補を修復するFixerと、残りのバグを公開する新しいテストを生成するために候補コードを読み取るAuditorの4つのステージを通じて、1つの特別なロールを持つ共有モデルをトレーニングする。我々は、APPS、CodeContests、xCodeEvalの3つのベンチマークでFixAuditを評価した。 7Bモデルに適用すると、このフレームワークはゼロショット設定の下で同じモデルファミリー内のより大きな32Bベースラインの平均性能を上回る。同じ7Bベースモデルで構築された強いベースラインと比較して、FixAuditは平均パス@1を35.1%から36.8%、平均AvgPassRatioを7.1%から24.5%改善している。

論文の概要: An Iterative Test-and-Repair Framework for Competitive Code Generation

関連論文リスト