Fugu-MT 論文翻訳(概要): LLMs for Automated Unit Test Generation and Assessment in Java: The AgoneTest Framework

論文の概要: LLMs for Automated Unit Test Generation and Assessment in Java: The AgoneTest Framework

arxiv url: http://arxiv.org/abs/2511.20403v1
Date: Tue, 25 Nov 2025 15:33:00 GMT
ステータス: 翻訳完了
システム内更新日: 2025-11-26 17:37:04.524739
Title: LLMs for Automated Unit Test Generation and Assessment in Java: The AgoneTest Framework
Title（参考訳）: Javaにおける自動ユニットテスト生成とアセスメントのためのLLM: AgoneTestフレームワーク
Authors: Andrea Lops, Fedelucio Narducci, Azzurra Ragone, Michelantonio Trizio, Claudio Barto,
Abstract要約: AgoneTestは、Javaにおける大規模言語モデル生成ユニットテストの評価フレームワークである。コンパイルされるテストのサブセットでは、LLMの生成したテストは、カバレッジと欠陥検出の点で、人間によるテストと一致またはオーバーすることができる。
参考スコア（独自算出の注目度）: 2.501198441875755
License: http://creativecommons.org/licenses/by-nc-nd/4.0/
Abstract: Unit testing is an essential but resource-intensive step in software development, ensuring individual code units function correctly. This paper introduces AgoneTest, an automated evaluation framework for Large Language Model-generated (LLM) unit tests in Java. AgoneTest does not aim to propose a novel test generation algorithm; rather, it supports researchers and developers in comparing different LLMs and prompting strategies through a standardized end-to-end evaluation pipeline under realistic conditions. We introduce the Classes2Test dataset, which maps Java classes under test to their corresponding test classes, and a framework that integrates advanced evaluation metrics, such as mutation score and test smells, for a comprehensive assessment. Experimental results show that, for the subset of tests that compile, LLM-generated tests can match or exceed human-written tests in terms of coverage and defect detection. Our findings also demonstrate that enhanced prompting strategies contribute to test quality. AgoneTest clarifies the potential of LLMs in software testing and offers insights for future improvements in model design, prompt engineering, and testing practices.
Abstract（参考訳）: 単体テストはソフトウェア開発において必須だがリソース集約的なステップであり、個々のコードユニットが正しく機能することを保証します。本稿では,Javaにおける大規模言語モデル生成(LLM)ユニットテストの自動評価フレームワークであるAgoneTestを紹介する。 AgoneTestは、新しいテスト生成アルゴリズムを提案することではなく、研究者と開発者が異なるLLMを比較し、現実的な条件下で標準化されたエンドツーエンド評価パイプラインを通じて戦略を促進するのをサポートする。テスト対象のJavaクラスを対応するテストクラスにマッピングするClasses2Testデータセットと、突然変異スコアやテストの臭いといった高度な評価指標を統合して総合的な評価を行うフレームワークについて紹介する。実験の結果、コンパイルされたテストのサブセットに対して、LLM生成テストは、カバレッジと欠陥検出の観点から、人手によるテストと一致またはオーバー可能であることが示された。また,プロンプト戦略の強化がテスト品質に寄与することが示唆された。 AgoneTestは、ソフトウェアテストにおけるLLMの可能性を明らかにし、モデル設計、迅速なエンジニアリング、テストプラクティスにおける将来の改善に対する洞察を提供する。

論文の概要: LLMs for Automated Unit Test Generation and Assessment in Java: The AgoneTest Framework

関連論文リスト