Fugu-MT 論文翻訳(概要): Test Behaviors, Not Methods! Detecting Tests Obsessed by Methods

論文の概要: Test Behaviors, Not Methods! Detecting Tests Obsessed by Methods

arxiv url: http://arxiv.org/abs/2602.00761v1
Date: Sat, 31 Jan 2026 14:58:39 GMT
ステータス: 翻訳完了
システム内更新日: 2026-02-10 14:43:45.15733
Title: Test Behaviors, Not Methods! Detecting Tests Obsessed by Methods
Title（参考訳）: テストの振る舞いはメソッドではなく! メソッドによるテストの検出
Authors: Andre Hora, Andy Zaidman,
Abstract要約: 複数の振る舞いを検証するテストは理解が難しく、フォーカスが欠如し、本番コードとより結びついています。本稿では, 単一生産方式の複数経路をカバーするテスト手法であるemphTest Obsessed by Methodを提案する。
参考スコア（独自算出の注目度）: 3.6417668958891785
License: http://creativecommons.org/licenses/by/4.0/
Abstract: Best testing practices state that tests should verify a single functionality or behavior of the system. Tests that verify multiple behaviors are harder to understand, lack focus, and are more coupled to the production code. An attempt to identify this issue is the test smell \emph{Eager Test}, which aims to capture tests that verify too much functionality based on the number of production method calls. Unfortunately, prior research suggests that counting production method calls is an inaccurate measure, as these calls do not reliably serve as a proxy for functionality. We envision a complementary solution based on runtime analysis: we hypothesize that some tests that verify multiple behaviors will likely cover multiple paths of the same production methods. Thus, we propose a novel test smell named \emph{Test Obsessed by Method}, a test method that covers multiple paths of a single production method. We provide an initial empirical study to explore the presence of this smell in 2,054 tests provided by 12 test suites of the Python Standard Library. (1) We detect 44 \emph{Tests Obsessed by Methods} in 11 of the 12 test suites. (2) Each smelly test verifies a median of two behaviors of the production method. (3) The 44 smelly tests could be split into 118 novel tests. (4) 23% of the smelly tests have code comments recognizing that distinct behaviors are being tested. We conclude by discussing benefits, limitations, and further research.
Abstract（参考訳）: ベストプラクティスは、テストはシステムの単一の機能や振る舞いを検証するべきである、と述べている。複数の振る舞いを検証するテストは理解が難しく、フォーカスが欠如し、本番コードとより結びついています。この問題を識別しようとする試みはテストの臭いである \emph{Eager Test} である。これはプロダクションメソッド呼び出しの数に基づいて過剰な機能を検証するテストのキャプチャを目的としている。残念ながら、以前の研究では、これらの呼び出しが機能のプロキシとして確実に機能するわけではないため、プロダクションメソッド呼び出しを数えることが不正確であると示唆されていた。複数の動作を検証するテストが、同じプロダクションメソッドの複数のパスをカバーする可能性が高いと仮定する。そこで本研究では,単一生産方式の複数経路をカバーするテスト手法である<emph{Test Obsessed by Method}を提案する。 Python Standard Libraryの12のテストスイートで提供される2,054のテストで、この匂いの存在を調査するための実験的な研究を行った。 1)12テストスイートの11で44 \emph{Tests Obsessed by Methods}を検出する。 2) 各臭気試験は, 製造方法の2つの挙動の中央値を検証する。 (3) 44の臭気試験は118の新規試験に分けることができた。 (4)嗅覚テストの23%は、異なる振る舞いがテストされていることを認識したコードコメントを持っている。我々は、利益、限界、そしてさらなる研究について議論することで結論付けます。

関連論文リスト

Reduction of Test Re-runs by Prioritizing Potential Order Dependent Flaky Tests [0.5798758080057375]
不安定なテストは、予測不可能な振る舞いのため、自動化されたソフトウェアテストの信頼性を損なう可能性がある。フラキーテストの一般的なタイプは、順序依存(OD)テストである。本稿では,潜在的なODテストの優先順位付け手法を提案する。
論文参考訳（メタデータ） (2025-10-30T06:17:30Z)
Intention-Driven Generation of Project-Specific Test Cases [45.2380093475221]
検証意図の記述からプロジェクト固有のテストを生成するIntentionTestを提案する。 13のオープンソースプロジェクトから4,146件のテストケースで,最先端のベースライン(DA, ChatTester, EvoSuite)に対してIntentionTestを広範囲に評価した。
論文参考訳（メタデータ） (2025-07-28T08:35:04Z)
Studying the Impact of Early Test Termination Due to Assertion Failure on Code Coverage and Spectrum-based Fault Localization [48.22524837906857]
本研究は,アサーション障害による早期検査終了に関する最初の実証的研究である。 6つのオープンソースプロジェクトの207バージョンを調査した。以上の結果から,早期検査終了は,コードカバレッジとスペクトルに基づく障害局所化の有効性の両方を損なうことが示唆された。
論文参考訳（メタデータ） (2025-04-06T17:14:09Z)
Detecting and Evaluating Order-Dependent Flaky Tests in JavaScript [3.6513675781808357]
不安定なテストは、ソフトウェアテストに重大な問題を引き起こす。これまでの研究では、テストオーダー依存性がフレキネスの最も一般的な原因の1つとして特定されている。本稿では,JavaScriptテストにおけるテスト順序依存性について検討する。
論文参考訳（メタデータ） (2025-01-22T06:52:11Z)
Model Equality Testing: Which Model Is This API Serving? [59.005869726179455]
APIプロバイダは、基本モデルの定量化、透かし、微調整を行い、出力分布を変更することができる。モデル平等テスト(Model Equality Testing)は,2サンプルテスト問題である。単純な文字列カーネル上に構築されたテストは、歪みの範囲に対して77.4%の中央値を達成する。
論文参考訳（メタデータ） (2024-10-26T18:34:53Z)
Observation-based unit test generation at Meta [52.4716552057909]
TestGenは、アプリケーション実行中に観察された複雑なオブジェクトのシリアライズされた観察から作られたユニットテストを自動的に生成する。 TestGenは518のテストを本番環境に投入し、継続的統合で9,617,349回実行され、5,702の障害が見つかった。評価の結果,信頼性の高い4,361のエンドツーエンドテストから,少なくとも86%のクラスでテストを生成することができた。
論文参考訳（メタデータ） (2024-02-09T00:34:39Z)
Evaluating the Robustness of Test Selection Methods for Deep Neural Networks [32.01355605506855]
ディープラーニングベースのシステムをテストすることは重要だが、収集した生データのラベル付けに必要な時間と労力のために難しい。ラベル付けの労力を軽減するため、テストデータのサブセットのみをラベル付けする複数のテスト選択法が提案されている。本稿では,テスト選択手法がいつ,どの程度テストに失敗するかを考察する。
論文参考訳（メタデータ） (2023-07-29T19:17:49Z)
Test-Agnostic Long-Tailed Recognition by Test-Time Aggregating Diverse Experts with Self-Supervision [85.07855130048951]
本研究では,テスト非依存型ロングテール認識(test-agnostic long-tailed recognition)と呼ばれる,より実践的なタスク設定について検討する。本稿では,多種多様な専門家に異なるテスト分布を扱うように訓練するTADE(Test-time Aggregating Diverse Experts)と呼ばれる新しい手法を提案する。理論的には,提案手法は未知のテストクラス分布をシミュレートできることを示す。
論文参考訳（メタデータ） (2021-07-20T04:10:31Z)

関連論文リストは本サイト内にある論文のタイトル・アブストラクトから自動的に作成しています。

指定された論文の情報です。
本サイトの運営者は本サイト（すべての情報・翻訳含む）の品質を保証せず、本サイト（すべての情報・翻訳含む）を使用して発生したあらゆる結果について一切の責任を負いません。