Fugu-MT 論文翻訳(概要): Risk-Aware Batch Testing for Performance Regression Detection

論文の概要: Risk-Aware Batch Testing for Performance Regression Detection

arxiv url: http://arxiv.org/abs/2604.00222v1
Date: Tue, 31 Mar 2026 20:39:46 GMT
ステータス: 翻訳完了
システム内更新日: 2026-04-02 16:44:31.713405
Title: Risk-Aware Batch Testing for Performance Regression Detection
Title（参考訳）: 性能回帰検出のためのリスク対応バッチテスト
Authors: Ali Sayedsalehi, Peter C. Rigby, Gregory Mierzwinski,
Abstract要約: 私たちはAutolandと一致した人間確認された回帰のプロダクションベースデータセットを構築します。コミットレベルのパフォーマンスリスクを見積もるために、ModernBERT、CodeBERT、LLaMA3.1を微調整します。
参考スコア（独自算出の注目度）: 1.0705399532413615
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Performance regression testing is essential in large-scale continuous-integration (CI) systems, yet executing full performance suites for every commit is prohibitively expensive. Prior work on performance regression prediction and batch testing has shown independent benefits, but each faces practical limitations: predictive models are rarely integrated into CI decision-making, and conventional batching strategies ignore commit-level heterogeneity. We unify these strands by introducing a risk-aware framework that integrates machine-learned commit risk with adaptive batching. Using Mozilla Firefox as a case study, we construct a production-derived dataset of human-confirmed regressions aligned chronologically with Autoland, and fine-tune ModernBERT, CodeBERT, and LLaMA-3.1 variants to estimate commit-level performance regression risk, achieving up to 0.694 ROC-AUC with CodeBERT. The risk scores drive a family of risk-aware batching strategies, including Risk-Aged Priority Batching and Risk-Adaptive Stream Batching, evaluated through realistic CI simulations. Across thousands of historical Firefox commits, our best overall configuration, Risk-Aged Priority Batching with linear aggregation (RAPB-la), yields a Pareto improvement over Mozilla's production-inspired baseline. RAPB-la reduces total test executions by 32.4%, decreases mean feedback time by 3.8%, maintains mean time-to-culprit at approximately the baseline level, reduces maximum time-to-culprit by 26.2%, and corresponds to an estimated annual infrastructure cost savings of approximately $491K under our cost model. These results demonstrate that risk-aware batch testing can reduce CI resource consumption while improving diagnostic timeliness. To support reproducibility and future research, we release a complete replication package containing all datasets, fine-tuning pipelines, and implementations of our batching algorithms.
Abstract（参考訳）: 大規模な継続的統合(CI)システムではパフォーマンスレグレッションテストが不可欠だが、コミット毎に完全なパフォーマンススイートを実行するのは非常に高価だ。予測モデルはCI意思決定にはほとんど統合されず、従来のバッチ戦略はコミットレベルの不均一性を無視している。マシン学習のコミットリスクと適応的なバッチ処理を統合したリスク認識フレームワークを導入することで、これらのストランドを統一する。ケーススタディとしてMozilla Firefoxを用いて、Autolandと時系列的に一致した人間確認レグレッションのプロダクションベースデータセットを構築し、CodeBERTで最大0.694LOC-AUCを達成し、コミットレベルのパフォーマンスレグレッションリスクを見積もる。リスクスコアは、リスク対応優先度バッチやリスク適応ストリームバッチなどのリスク対応バッチ戦略を、現実的なCIシミュレーションを通じて評価するものだ。何千もの歴史的なFirefoxコミット、最高の構成、線形アグリゲーション(RAPB-la)によるリスクAged Priority Batchingは、MozillaのプロダクションインスパイアされたベースラインよりもParetoの改善をもたらします。 RAPB-laは総テスト実行量を32.4%削減し、平均フィードバック時間を3.8%削減し、平均タイム・トゥ・カプラートをほぼベースラインレベルで維持し、最大タイム・トゥ・カプラートを26.2%削減し、当社のコストモデルで推定されるインフラコストを約491K削減する。これらの結果は、リスクを意識したバッチテストは、診断タイムラインを改善しながら、CIリソースの消費を減らすことができることを示している。再現性と今後の研究をサポートするため、すべてのデータセット、微調整パイプライン、バッチアルゴリズムの実装を含む完全な複製パッケージをリリースする。

論文の概要: Risk-Aware Batch Testing for Performance Regression Detection

関連論文リスト