Fugu-MT 論文翻訳(概要): Learning to Generate Unit Test via Adversarial Reinforcement Learning

論文の概要: Learning to Generate Unit Test via Adversarial Reinforcement Learning

arxiv url: http://arxiv.org/abs/2508.21107v1
Date: Thu, 28 Aug 2025 14:32:44 GMT
ステータス: 翻訳完了
システム内更新日: 2025-09-01 19:45:10.832321
Title: Learning to Generate Unit Test via Adversarial Reinforcement Learning
Title（参考訳）: 対向強化学習による単体テスト生成の学習
Authors: Dongjun Lee, Changho Hwang, Kimin Lee,
Abstract要約: 単体テストはプログラミングにおける中核的な実践であり、人間の開発者や大規模言語モデル(LLM)によるプログラムの体系的な評価を可能にする。プログラム命令を与えられた高品質な単体テストを生成するためにLLMを訓練する新しい強化学習フレームワークであるUTRLを提案する。実験では, UTRLを用いてトレーニングしたQwen3-4Bが生成した単体テストが, 教師付き微調整によりトレーニングした同一モデルで生成した単体テストと比較して高い品質を示した。
参考スコア（独自算出の注目度）: 33.82915303652549
License: http://creativecommons.org/licenses/by-nc-sa/4.0/
Abstract: Unit testing is a core practice in programming, enabling systematic evaluation of programs produced by human developers or large language models (LLMs). Given the challenges in writing comprehensive unit tests, LLMs have been employed to automate test generation, yet methods for training LLMs to produce high-quality tests remain underexplored. In this work, we propose UTRL, a novel reinforcement learning framework that trains an LLM to generate high-quality unit tests given a programming instruction. Our key idea is to iteratively train two LLMs, the unit test generator and the code generator, in an adversarial manner via reinforcement learning. The unit test generator is trained to maximize a discrimination reward, which reflects its ability to produce tests that expose faults in the code generator's solutions, and the code generator is trained to maximize a code reward, which reflects its ability to produce solutions that pass the unit tests generated by the test generator. In our experiments, we demonstrate that unit tests generated by Qwen3-4B trained via UTRL show higher quality compared to unit tests generated by the same model trained via supervised fine-tuning on human-written ground-truth unit tests, yielding code evaluations that more closely align with those induced by the ground-truth tests. Moreover, Qwen3-4B trained with UTRL outperforms frontier models such as GPT-4.1 in generating high-quality unit tests, highlighting the effectiveness of UTRL in training LLMs for this task.
Abstract（参考訳）: 単体テストはプログラミングにおける中核的な実践であり、人間の開発者や大規模言語モデル(LLM)が生み出すプログラムの体系的な評価を可能にする。包括的な単体テストを書く際の課題を考えると、LLMはテスト生成を自動化するために使われてきたが、高品質なテストを生成するためにLLMを訓練する手法はまだ未熟である。本研究では,LLMを学習し,プログラミングの指導を受けると高品質な単体テストを生成する新しい強化学習フレームワークであるUTRLを提案する。私たちのキーとなるアイデアは、2つのLLM、ユニットテストジェネレータとコードジェネレータを、強化学習を通じて逆向きに訓練することです。ユニットテストジェネレータは、コードジェネレータのソリューションの欠陥を明らかにするテストを生成する能力を反映した差別報酬を最大化するために訓練され、コードジェネレータはコード報酬を最大化するために訓練され、テストジェネレータによって生成されたユニットテストをパスするソリューションを生成する能力を反映する。実験では, UTRLを用いてトレーニングしたQwen3-4Bによる単体テストは, 人手による微調整により訓練した同一モデルによる単体テストよりも高い品質を示すことを示した。さらに、UTRLで訓練されたQwen3-4Bは、GPT-4.1のようなフロンティアモデルよりも高い品質の単体テストを生成する。

論文の概要: Learning to Generate Unit Test via Adversarial Reinforcement Learning

関連論文リスト