Fugu-MT 論文翻訳(概要): When Should I Run My Application Benchmark?: Studying Cloud Performance Variability for the Case of Stream Processing Applications

論文の概要: When Should I Run My Application Benchmark?: Studying Cloud Performance Variability for the Case of Stream Processing Applications

arxiv url: http://arxiv.org/abs/2504.11826v1
Date: Wed, 16 Apr 2025 07:22:44 GMT
ステータス: 翻訳完了
システム内更新日: 2025-04-24 21:22:13.818218
Title: When Should I Run My Application Benchmark?: Studying Cloud Performance Variability for the Case of Stream Processing Applications
Title（参考訳）: アプリケーションのベンチマークをいつ実行するべきか? - ストリーム処理アプリケーションの場合のクラウドパフォーマンス変数の検討-
Authors: Sören Henning, Adriano Vogel, Esteban Perez-Wohlfeil, Otmar Ertl, Rick Rabiser,
Abstract要約: 本稿では,クラウド性能の変動がベンチマーク結果に与える影響を実証的に定量化する。約591時間の試験、AWS上の789クラスタのデプロイ、2366ベンチマークの実行などにより、この種の調査としては最大のものと思われる。
参考スコア（独自算出の注目度）: 1.3398445165628463
License: http://creativecommons.org/licenses/by/4.0/
Abstract: Performance benchmarking is a common practice in software engineering, particularly when building large-scale, distributed, and data-intensive systems. While cloud environments offer several advantages for running benchmarks, it is often reported that benchmark results can vary significantly between repetitions -- making it difficult to draw reliable conclusions about real-world performance. In this paper, we empirically quantify the impact of cloud performance variability on benchmarking results, focusing on stream processing applications as a representative type of data-intensive, performance-critical system. In a longitudinal study spanning more than three months, we repeatedly executed an application benchmark used in research and development at Dynatrace. This allows us to assess various aspects of performance variability, particularly concerning temporal effects. With approximately 591 hours of experiments, deploying 789 Kubernetes clusters on AWS and executing 2366 benchmarks, this is likely the largest study of its kind and the only one addressing performance from an end-to-end, i.e., application benchmark perspective. Our study confirms that performance variability exists, but it is less pronounced than often assumed (coefficient of variation of < 3.7%). Unlike related studies, we find that performance does exhibit a daily and weekly pattern, although with only small variability (<= 2.5%). Re-using benchmarking infrastructure across multiple repetitions introduces only a slight reduction in result accuracy (<= 2.5 percentage points). These key observations hold consistently across different cloud regions and machine types with different processor architectures. We conclude that for engineers and researchers focused on detecting substantial performance differences (e.g., > 5%) in...
Abstract（参考訳）: パフォーマンスベンチマークは、特に大規模、分散、およびデータ集約システムを構築する場合、ソフトウェア工学において一般的なプラクティスである。クラウド環境は、ベンチマークを実行する上でいくつかの利点を提供しているが、ベンチマークの結果が繰り返しによって大きく異なる可能性があることがしばしば報告されている。本稿では,クラウド性能の変動がベンチマーク結果に与える影響を実証的に定量化し,データ集約・パフォーマンスクリティカルシステムの代表型としてストリーム処理アプリケーションに焦点をあてる。 3ヶ月以上にわたる縦断的研究では,Dynatraceの研究開発に使用されるアプリケーションベンチマークを繰り返し実施した。これにより、特に時間的効果に関して、パフォーマンスの多様性の様々な側面を評価することができる。約591時間の試験、AWS上の789のKubernetesクラスタのデプロイ、2366ベンチマークの実行などによって、この種の調査としては最大のもので、アプリケーションベンチマークの観点から、エンドツーエンドのパフォーマンスに対処する唯一のものだ。本研究は, 性能変動は存在することを確認したが, しばしば想定されるほど顕著ではない(変動係数は3.7%)。関連する研究とは異なり、パフォーマンスは日毎および週毎のパターンを示すが、小さな変動(=2.5%)しか持たない。複数の繰り返しにわたってベンチマークインフラストラクチャを再利用すると、結果の精度がわずかに低下する(=2.5ポイント)。これらの重要な観測は、異なるクラウドリージョンと異なるプロセッサアーキテクチャを持つマシンタイプを一貫して保持する。結論として、エンジニアと研究者は、実際のパフォーマンスの違い(例: 5%)を検出することに重点を置いている。

関連論文リスト

Reinforcement Learning for Dynamic Resource Allocation in Optical Networks: Hype or Hope? [39.78423267310698]
光ネットワークにおける動的資源配分への強化学習の適用は、近年の激しい研究活動の焦点となっている。本稿では、この分野の進歩を概観し、ベンチマークの実践とソリューションにおける大きなギャップを明らかにする。
論文参考訳（メタデータ） (2025-02-18T12:09:42Z)
SeBS-Flow: Benchmarking Serverless Cloud Function Workflows [51.4200085836966]
本稿では、最初のサーバーレスワークフローベンチマークスイートSeBS-Flowを提案する。 SeBS-Flowには6つの実世界のアプリケーションベンチマークと、異なる計算パターンを表す4つのマイクロベンチマークが含まれている。当社では,パフォーマンス,コスト,スケーラビリティ,ランタイムの偏差など,3つの主要なクラウドプラットフォームに関する包括的な評価を実施しています。
論文参考訳（メタデータ） (2024-10-04T14:52:18Z)
Green AI: A Preliminary Empirical Study on Energy Consumption in DL Models Across Different Runtime Infrastructures [56.200335252600354]
トレーニング済みのモデルを、ネイティブな開発環境とは異なる環境にデプロイするのは、一般的なプラクティスです。これにより、インフラを含むONNXや標準フォーマットとして機能するONNXなどの交換フォーマットが導入された。
論文参考訳（メタデータ） (2024-02-21T09:18:44Z)
Rethinking Few-Shot Object Detection on a Multi-Domain Benchmark [28.818423712485504]
Multi-dOmain Few-Shot Object Detection (MoFSOD)ベンチマークは、幅広いドメインから10のデータセットで構成されている。我々は、FSOD性能に対する凍結層、異なるアーキテクチャ、異なる事前学習データセットの影響を分析する。
論文参考訳（メタデータ） (2022-07-22T16:13:22Z)
Benchopt: Reproducible, efficient and collaborative optimization benchmarks [67.29240500171532]
Benchoptは、機械学習で最適化ベンチマークを自動化、再生、公開するためのフレームワークである。 Benchoptは実験を実行、共有、拡張するための既製のツールを提供することで、コミュニティのベンチマークを簡単にする。
論文参考訳（メタデータ） (2022-06-27T16:19:24Z)
Analyzing the Impact of Undersampling on the Benchmarking and Configuration of Evolutionary Algorithms [3.967483941966979]
限られたデータに基づいて意思決定を行う場合、注意が必要であることを示す。統計的レースを用いてラン数を動的に調整しても,20%以上の性能損失の例を示す。
論文参考訳（メタデータ） (2022-04-20T09:53:59Z)
ERNIE-SPARSE: Learning Hierarchical Efficient Transformer Through Regularized Self-Attention [48.697458429460184]
情報ボトルネック感度と異なる注目トポロジ間の不整合の2つの要因がスパース変換器の性能に影響を及ぼす可能性がある。本稿では,ERNIE-Sparseというモデルを提案する。 i) 局所情報とグローバル情報を逐次統一する階層スパース変換器(HST) と、(ii) 注意トポロジの異なる変換器の距離を最小化する自己注意正規化(SAR) の2つの特徴がある。
論文参考訳（メタデータ） (2022-03-23T08:47:01Z)
Multi-Domain Joint Training for Person Re-Identification [51.73921349603597]
ReID(Deep Learning-based person Re-IDentification)は、優れたパフォーマンスを達成するために、大量のトレーニングデータを必要とすることが多い。多様な環境からより多くのトレーニングデータを集めることで、ReIDのパフォーマンスが向上する傾向にある。本稿では,パラメータを様々な要因に適応させることができる,Domain-Camera-Sample Dynamic Network (DCSD) というアプローチを提案する。
論文参考訳（メタデータ） (2022-01-06T09:20:59Z)
DAPPER: Label-Free Performance Estimation after Personalization for Heterogeneous Mobile Sensing [95.18236298557721]
DAPPER(Domain AdaPtation Performance EstimatoR)を提案する。実世界の6つのベースラインと比較した4つのセンシングデータセットによる評価の結果,DAPPERの精度は39.8%向上した。
論文参考訳（メタデータ） (2021-11-22T08:49:33Z)
Benchmarking and Performance Modelling of MapReduce Communication Pattern [0.0]
モデルは、目に見えないアプリケーションのパフォーマンスを推測し、任意のデータセットを入力として使用する場合のパフォーマンスを近似するために使用することができる。実証実験を2つの設定で実施することで,本手法の有効性を検証した。
論文参考訳（メタデータ） (2020-05-23T21:52:29Z)

関連論文リストは本サイト内にある論文のタイトル・アブストラクトから自動的に作成しています。

指定された論文の情報です。
本サイトの運営者は本サイト（すべての情報・翻訳含む）の品質を保証せず、本サイト（すべての情報・翻訳含む）を使用して発生したあらゆる結果について一切の責任を負いません。