Fugu-MT 論文翻訳(概要): PITMuS: A Tool for Automated Bug Dataset Generation via Source-Level Mutant Reconstruction

論文の概要: PITMuS: A Tool for Automated Bug Dataset Generation via Source-Level Mutant Reconstruction

arxiv url: http://arxiv.org/abs/2605.21930v1
Date: Thu, 21 May 2026 02:59:19 GMT
ステータス: 翻訳完了
システム内更新日: 2026-05-22 20:14:18.513588
Title: PITMuS: A Tool for Automated Bug Dataset Generation via Source-Level Mutant Reconstruction
Title（参考訳）: PITMuS:ソースレベル変異体再構成によるバグデータセット自動生成ツール
Authors: Tasfia Tasnim, Soneya Binta Hossain,
Abstract要約: PITMuSは、バイトコードレベルで突然変異を実行するJavaの突然変異テストツールである。 PITMuS XMLメタデータとコンパイルされたJavaクラスファイルからのデバッグ情報を組み合わせて、各ミュータントに対応するソース編集をローカライズし、再構築する。ソースレベルのバグと固定コードペア、ドキュメントコンテキスト、下流のトレーニングと評価のためのメタデータを含む構造化データセットを生成する。
参考スコア（独自算出の注目度）: 5.590965631053725
License: http://creativecommons.org/licenses/by/4.0/
Abstract: LLM-based software engineering increasingly depends on executable, context-rich bug artifacts: paired correct and buggy code, methods under test (MUTs), documentation, and metadata. These artifacts support the training and evaluation of automated bug localization and repair techniques, testing and test oracle generation methods, and documentation-driven automation. Although curated benchmarks (e.g., Defects4J) remain valuable, they are static and increasingly vulnerable to contamination as code models are trained on large public corpora. A complementary strategy is to generate fresh, cutoff-aware datasets by selecting real system versions and injecting controlled bugs at the source level. Mutation testing is a natural basis for this strategy: it applies predefined mutation operators to programs and records whether the existing test suite detects each injected change. PIT is a state-of-the-practice mutation testing tool for Java that performs mutation at the bytecode level. This design makes mutation testing fast and practical, but PITMuS reports mutants primarily through XML, making them difficult to inspect, replay, or reuse as structured source-level dataset records. To address this gap, we present PITMuS, which combines PITMuS XML metadata with debug information from compiled Java class files to localize and reconstruct the source edit corresponding to each mutant. PITMuS then automatically produces structured datasets containing source-level buggy and fixed code pairs, documentation context, and metadata for downstream training and evaluation. Although we evaluate PITMuS on eight open-source Java systems, it can be applied to any Java system where PITMuS can be integrated.
Abstract（参考訳）: LLMベースのソフトウェアエンジニアリングは、ますます実行可能で、コンテキストに富んだバグアーティファクトに依存している:ペア化された正しいコードとバグの多いコード、テスト中のメソッド(MUT)、ドキュメント、メタデータ。これらのアーティファクトは、自動バグローカライゼーションと修復テクニックのトレーニングと評価、オラクル生成方法のテストとテスト、ドキュメント駆動の自動化をサポートする。キュレートされたベンチマーク(例:Defects4J)は価値はあるものの、コードモデルが大規模な公開コーパスでトレーニングされているため、静的であり、汚染に対してますます脆弱になっている。補完的な戦略は、実際のシステムバージョンを選択し、ソースレベルで制御されたバグを注入することで、新しくカットオフ対応のデータセットを生成することである。プログラムに事前定義された突然変異演算子を適用し、既存のテストスイートが各注入された変更を検出するかどうかを記録する。 PITは、バイトコードレベルで突然変異を実行するJavaのための、最先端の突然変異テストツールである。この設計により、突然変異テストは高速かつ実用的なものとなるが、PITMuSは、主にXMLを通してミュータントを報告し、構造化されたソースレベルのデータセットレコードとして検査、再生、再利用が困難になる。このギャップに対処するために、PITMuS XMLメタデータとコンパイルされたJavaクラスファイルからのデバッグ情報を組み合わせて、各ミュータントに対応するソース編集をローカライズし、再構築するPITMuSを提案する。 PITMuSは、ソースレベルのバグギーと固定コードペア、ドキュメントコンテキスト、下流トレーニングと評価のためのメタデータを含む構造化データセットを自動的に生成する。 8つのオープンソースJavaシステム上でPITMuSを評価するが、PITMuSを統合可能な任意のJavaシステムに適用できる。

論文の概要: PITMuS: A Tool for Automated Bug Dataset Generation via Source-Level Mutant Reconstruction

関連論文リスト