Fugu-MT 論文翻訳(概要): Boosting LLMs for Mutation Generation

論文の概要: Boosting LLMs for Mutation Generation

arxiv url: http://arxiv.org/abs/2603.24560v1
Date: Wed, 25 Mar 2026 17:42:17 GMT
ステータス: 翻訳完了
システム内更新日: 2026-03-26 21:06:11.414204
Title: Boosting LLMs for Mutation Generation
Title（参考訳）: 突然変異発生用LDMのブースティング
Authors: Bo Wang, Ming Deng, Mingda Chen, Chengran Yang, Youfang Lin, Mark Harman, Mike Papadakis, Jie M. Zhang,
Abstract要約: SMART(Semantic Mutation with Adaptive Retrieval and Tuning)を紹介する。 Defects4J と ConDefects のデータセットから1,991 個の実世界の Java バグを使用した SMART の実証的研究を行った。その結果、SMARTは変異の妥当性、有効性、効率を大幅に改善することが明らかとなった。
参考スコア（独自算出の注目度）: 35.905252475438466
License: http://creativecommons.org/licenses/by/4.0/
Abstract: LLM-based mutation testing is a promising testing technology, but existing approaches typically rely on a fixed set of mutations as few-shot examples or none at all. This can result in generic low-quality mutations, missed context-specific mutation patterns, substantial numbers of redundant and uncompilable mutants, and limited semantic similarity to real bugs. To overcome these limitations, we introduce SMART (Semantic Mutation with Adaptive Retrieval and Tuning). SMART integrates retrieval-augmented generation (RAG) on a vectorized dataset of real-world bugs, focused code chunking, and supervised fine-tuning using mutations coupled with real-world bugs. We conducted an extensive empirical study of SMART using 1,991 real-world Java bugs from the Defects4J and ConDefects datasets, comparing SMART to the state-of-the-art LLM-based approaches, LLMut and LLMorpheus. The results reveal that SMART substantially improves mutation validity, effectiveness, and efficiency (even enabling small-scale 7B-scale models to match or even surpass large models like GPT-4o). We also demonstrate that SMART significantly improves downstream software engineering applications, including test case prioritization and fault localization. More specifically, SMART improves validity (weighted average generation rate) from 42.89% to 65.6%. It raises the non-duplicate rate from 87.38% to 95.62%, and the compilable rate from 88.85% to 90.21%. In terms of effectiveness, it achieves a real bug detection rate of 92.61% (vs. 57.86% for LLMut) and improves the average Ochiai coefficient from 25.61% to 38.44%. For fault localization, SMART ranks 64 more bugs as Top-1 under MUSE and 57 more under Metallaxis.
Abstract（参考訳）: LLMベースの突然変異テストは有望なテスト技術であるが、既存のアプローチは通常、いくつかの例や全くの例として、固定された突然変異セットに依存している。これは結果として、一般的な低品質な突然変異、文脈固有の突然変異パターンの欠如、かなりの数の冗長かつ非コンパイル可能な突然変異、実際のバグとのセマンティックな類似性に制限される。 SMART(Semantic Mutation with Adaptive Retrieval and Tuning)を導入する。 SMARTは、検索拡張生成(RAG)を、実世界のバグのベクトル化されたデータセット、コードのチャンキング、および実世界のバグと組み合わせた突然変異を用いた教師付き微調整に統合する。我々は、Defects4JとCondefectsのデータセットから1,991個の実世界のJavaバグを用いてSMARTの広範な実験を行い、SMARTを最先端のLSMベースのアプローチであるLLMutとLLMorpheusと比較した。その結果、SMARTは突然変異の妥当性、有効性、効率を大幅に改善する(小型の7BスケールモデルでもGPT-4oのような大型モデルに匹敵したり、超えたりできる)。また、SMARTは、テストケースの優先順位付けやフォールトローカライゼーションなど、下流のソフトウェアエンジニアリングアプリケーションを大幅に改善することを示した。具体的には、SMARTは妥当性(平均生成率)を42.89%から65.6%に改善する。非重複率は87.38%から95.62%に上昇し、コンパイル可能なレートは88.85%から90.21%に上昇する。実際のバグ検出率は92.61%(LLMutは57.86%)で、平均落合係数を25.61%から38.44%に改善している。フォールトローカライゼーションでは、SMARTは64のバグをMUSEのTop-1、Metallaxisの57にランク付けしている。

論文の概要: Boosting LLMs for Mutation Generation

関連論文リスト