Fugu-MT 論文翻訳(概要): Machine Learning-Driven Predictive Resource Management in Complex Science Workflows

論文の概要: Machine Learning-Driven Predictive Resource Management in Complex Science Workflows

arxiv url: http://arxiv.org/abs/2509.11512v1
Date: Mon, 15 Sep 2025 01:53:30 GMT
ステータス: 翻訳完了
システム内更新日: 2025-09-16 17:26:23.119522
Title: Machine Learning-Driven Predictive Resource Management in Complex Science Workflows
Title（参考訳）: 複雑な科学ワークフローにおける機械学習による予測資源管理
Authors: Tasnuva Chowdhury, Tadashi Maeno, Fatih Furkan Akman, Joseph Boudreau, Sankha Dutta, Shengyu Feng, Adolfy Hoisie, Kuan-Chieh Hsu, Raees Khan, Jaehyung Kim, Ozgur O. Kilic, Scott Klasky, Alexei Klimentov, Tatiana Korchuganova, Verena Ingrid Martinez Outschoorn, Paul Nilsson, David K. Park, Norbert Podhorszki, Yihui Ren, John Rembrandt Steele, Frédéric Suter, Sairam Sri Vatsavai, Torre Wenaus, Wei Yang, Yiming Yang, Shinjae Yoo,
Abstract要約: 本研究では、包括的なワークフロー管理システムにおける機械学習モデルの新しいパイプラインについて紹介する。これらのモデルは、重要なリソース要求を予測するために高度な機械学習技術を使用している。
参考スコア（独自算出の注目度）: 34.67259555158463
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: The collaborative efforts of large communities in science experiments, often comprising thousands of global members, reflect a monumental commitment to exploration and discovery. Recently, advanced and complex data processing has gained increasing importance in science experiments. Data processing workflows typically consist of multiple intricate steps, and the precise specification of resource requirements is crucial for each step to allocate optimal resources for effective processing. Estimating resource requirements in advance is challenging due to a wide range of analysis scenarios, varying skill levels among community members, and the continuously increasing spectrum of computing options. One practical approach to mitigate these challenges involves initially processing a subset of each step to measure precise resource utilization from actual processing profiles before completing the entire step. While this two-staged approach enables processing on optimal resources for most of the workflow, it has drawbacks such as initial inaccuracies leading to potential failures and suboptimal resource usage, along with overhead from waiting for initial processing completion, which is critical for fast-turnaround analyses. In this context, our study introduces a novel pipeline of machine learning models within a comprehensive workflow management system, the Production and Distributed Analysis (PanDA) system. These models employ advanced machine learning techniques to predict key resource requirements, overcoming challenges posed by limited upfront knowledge of characteristics at each step. Accurate forecasts of resource requirements enable informed and proactive decision-making in workflow management, enhancing the efficiency of handling diverse, complex workflows across heterogeneous resources.
Abstract（参考訳）: 科学実験における大規模なコミュニティの協力活動は、しばしば数千人のグローバルなメンバーで構成されており、探検と発見に対する重要なコミットメントを反映している。近年、科学実験において、高度で複雑なデータ処理の重要性が高まっている。データ処理ワークフローは通常、複数の複雑なステップで構成されており、各ステップに対してリソース要求の正確な仕様は、効率的な処理のために最適なリソースを割り当てるのに不可欠である。さまざまな分析シナリオ、コミュニティメンバ間のスキルレベルの変化、継続的なコンピューティングオプションの帯域拡大など、事前にリソース要件を見積もることは困難である。これらの課題を軽減するための実践的なアプローチの1つは、まず最初に各ステップのサブセットを処理し、実際の処理プロファイルから正確なリソース使用量を計測してから、ステップ全体を完了させることである。この2段階のアプローチはワークフローの大部分で最適なリソースの処理を可能にするが、潜在的な失敗につながる初期不正確さや、リソース使用量の最適化といった欠点や、高速なターンアラウンド分析に不可欠な初期処理完了を待つオーバーヘッドがある。本研究では,包括的なワークフロー管理システムであるProduct and Distributed Analysis(PanDA)システムにおいて,機械学習モデルの新たなパイプラインを導入する。これらのモデルは、重要なリソース要件を予測するために高度な機械学習技術を使用し、各ステップにおける特性に関する事前知識の制限によって引き起こされる課題を克服する。リソース要求の正確な予測により、ワークフロー管理のインフォームドおよびプロアクティブな意思決定が可能になり、異種リソースをまたいだ多種多様な複雑なワークフローの処理効率が向上する。

論文の概要: Machine Learning-Driven Predictive Resource Management in Complex Science Workflows

関連論文リスト