Fugu-MT 論文翻訳(概要): Code-A1: Adversarial Evolving of Code LLM and Test LLM via Reinforcement Learning

論文の概要: Code-A1: Adversarial Evolving of Code LLM and Test LLM via Reinforcement Learning

arxiv url: http://arxiv.org/abs/2603.15611v1
Date: Mon, 16 Mar 2026 17:58:13 GMT
ステータス: 翻訳完了
システム内更新日: 2026-03-17 18:28:58.723558
Title: Code-A1: Adversarial Evolving of Code LLM and Test LLM via Reinforcement Learning
Title（参考訳）: Code-A1: 強化学習によるコードLLMとテストLLMの対立進化
Authors: Aozhe Wang, Yuchen Yan, Nan Zhou, Zhengxi Lu, Weiming Lu, Jun Xiao, Yueting Zhuang, Yongliang Shen,
Abstract要約: コード生成のための強化学習は、単体テストのパスレートから検証可能な報酬に依存する。最近のセルフプレイ手法は、1つのモデルでコードとテスト生成を統合する。 Code-A1は、人間のアノテーションによるテストでトレーニングされたコード生成のパフォーマンスマッチングまたはモデルを超えることを実現する。
参考スコア（独自算出の注目度）: 54.95476453942411
License: http://creativecommons.org/licenses/by/4.0/
Abstract: Reinforcement learning for code generation relies on verifiable rewards from unit test pass rates. Yet high-quality test suites are scarce, existing datasets offer limited coverage, and static rewards fail to adapt as models improve. Recent self-play methods unify code and test generation in a single model, but face a inherent dilemma: white-box access leads to self-collusion where the model produces trivial tests for easy rewards, yet black-box restriction yields generic tests that miss implementation-specific bugs. We introduce Code-A1, an adversarial co-evolution framework that jointly optimizes a Code LLM and a Test LLM with opposing objectives. The Code LLM is rewarded for passing more tests, while the Test LLM is rewarded for exposing more defects. This architectural separation eliminates self-collusion risks and safely enables white-box test generation, where the Test LLM can inspect candidate code to craft targeted adversarial tests. We further introduce a Mistake Book mechanism for experience replay and a composite reward balancing test validity with adversarial difficulty. Experiments on Qwen2.5-Coder models demonstrate that Code-A1 achieves code generation performance matching or exceeding models trained on human-annotated tests, while significantly improving test generation capability.
Abstract（参考訳）: コード生成のための強化学習は、単体テストのパスレートから検証可能な報酬に依存する。しかし、高品質のテストスイートは乏しく、既存のデータセットはカバー範囲が限られており、モデルの改善に伴って静的報酬が適応できない。ホワイトボックスアクセスは、モデルが簡単な報酬のために自作テストを生成するが、ブラックボックス制限は、実装固有のバグを見逃す一般的なテストをもたらす。我々は,コードLLMとテストLLMを相互に最適化する,対向的共進化フレームワークであるCode-A1を紹介する。コードLLMは、より多くのテストをパスし、テストLLMは、より多くの欠陥を公開することで報酬を得る。このアーキテクチャ分離は自己修復のリスクを排除し、テストLLMが候補コードを検査し、ターゲットとする対向テストを作成するホワイトボックステスト生成を可能にする。さらに,体験リプレイのためのミステイクブック機構と,敵の難易度とテスト妥当性のバランスをとる複合報酬機構を導入する。 Qwen2.5-Coderモデルの実験では、Code-A1がコード生成のパフォーマンスの整合性、あるいは人手による注釈付きテストでトレーニングされたモデルを超え、テスト生成能力を大幅に改善することを示した。

論文の概要: Code-A1: Adversarial Evolving of Code LLM and Test LLM via Reinforcement Learning

関連論文リスト