Fugu-MT 論文翻訳(概要): Detecting Flaky Tests in Quantum Software: A Dynamic Approach

論文の概要: Detecting Flaky Tests in Quantum Software: A Dynamic Approach

arxiv url: http://arxiv.org/abs/2512.18088v2
Date: Fri, 26 Dec 2025 16:02:19 GMT
ステータス: 翻訳完了
システム内更新日: 2025-12-29 13:23:29.777485
Title: Detecting Flaky Tests in Quantum Software: A Dynamic Approach
Title（参考訳）: 量子ソフトウェアにおけるフレキテストの検出:動的アプローチ
Authors: Dongchan Kim, Hamidreza Khoramrokh, Lei Zhang, Andriy Miranskyy,
Abstract要約: コードや環境の変更なしに非決定的に通過または失敗する不安定なテストは、ソフトウェアの信頼性に深刻な脅威をもたらす。本稿では,量子ソフトウェアにおけるフレキテストの大規模動的評価について述べる。コントロールされた環境で、23リリースにまたがって1万回のQiskit Terraテストスイートを実行しました。
参考スコア（独自算出の注目度）: 4.46640294257026
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Flaky tests, tests that pass or fail nondeterministically without changes to code or environment, pose a serious threat to software reliability. While classical software engineering has developed a rich body of dynamic and static techniques to study flakiness, corresponding evidence for quantum software remains limited. Prior work relies primarily on static analysis or small sets of manually reported incidents, leaving open questions about the prevalence, characteristics, and detectability of flaky tests. This paper presents the first large-scale dynamic characterization of flaky tests in quantum software. We executed the Qiskit Terra test suite 10,000 times across 23 releases in controlled environments. For each release, we measured test-outcome variability, identified flaky tests, estimated empirical failure probabilities, analyzed recurrence across versions, and used Wilson confidence intervals to quantify rerun budgets for reliable detection. We further mapped flaky tests to Terra subcomponents to assess component-level susceptibility. Across 27,026 test cases, we identified 290 distinct flaky tests. Although overall flakiness rates were low (0-0.4%), flakiness was highly episodic: nearly two-thirds of flaky tests appeared in only one release, while a small subset recurred intermittently or persistently. Many flaky tests failed with very small empirical probabilities ($\hat{p} \approx 10^{-4}$), implying that tens of thousands of executions may be required for confident detection. Flakiness was unevenly distributed across subcomponents, with 'transpiler' and 'quantum_info' accounting for the largest share. These results show that quantum test flakiness is rare but difficult to detect under typical continuous integration budgets. To support future research, we release a public dataset of per-test execution outcomes.
Abstract（参考訳）: 不安定なテスト、コードや環境の変更なしに非決定的にパスまたは失敗するテストは、ソフトウェアの信頼性に深刻な脅威をもたらす。古典的なソフトウェア工学では、フレキネスを研究するためのリッチな動的および静的なテクニックが開発されているが、量子ソフトウェアの証拠は限られている。以前の作業は、主に静的分析や、手動で報告されたインシデントの小さなセットに依存しており、不安定なテストの頻度、特性、検出性に関するオープンな疑問を残している。本稿では,量子ソフトウェアにおけるフレキテストの大規模動的評価について述べる。コントロールされた環境で、23リリースにまたがって1万回のQiskit Terraテストスイートを実行しました。各リリースにおいて,テストアウトカムの変動,フレークテストの特定,経験的失敗確率の推定,バージョン間の再現性の解析,信頼性検出のための再実行予算の定量化にWilson信頼区間を用いた。さらに、コンポーネントレベルの感受性を評価するために、Terraサブコンポーネントにフレキなテストをマッピングした。 27,026検体中290検体が確認された。フレキネス率は低い(0-0.4%)が、フレキネスは非常にエピソジックで、フレキテストの約3分の2は1リリースでのみ出現し、小さなサブセットは断続的にまたは持続的に再発した。多くのフレキなテストは、非常に小さな経験的確率("\hat{p} \approx 10^{-4}$")で失敗した。フラキネスはサブコンポーネント間で不均一に分配され、最大のシェアは「トランスパイラー」と「クォータム・インフォ」である。これらの結果は、量子テストのフレキネスは稀であるが、典型的な連続積分予算下では検出が困難であることを示している。今後の研究を支援するため、テストごとの実行結果の公開データセットをリリースする。

論文の概要: Detecting Flaky Tests in Quantum Software: A Dynamic Approach

関連論文リスト