Fugu-MT 論文翻訳(概要): Guiding Application Users via Estimation of Computational Resources for Massively Parallel Chemistry Computations

論文の概要: Guiding Application Users via Estimation of Computational Resources for Massively Parallel Chemistry Computations

arxiv url: http://arxiv.org/abs/2509.20667v1
Date: Thu, 25 Sep 2025 02:00:36 GMT
ステータス: 翻訳完了
システム内更新日: 2025-09-26 20:58:12.653175
Title: Guiding Application Users via Estimation of Computational Resources for Massively Parallel Chemistry Computations
Title（参考訳）: 大規模並列化学計算のための計算資源推定によるアプリケーションユーザ誘導
Authors: Tanzila Tabassum, Omer Subasi, Ajay Panyala, Epiya Ebiapia, Gerald Baumgartner, Erdal Mutlu, P., Sadayappan, Karol Kowalski,
Abstract要約: 我々は,スーパーコンピュータ上で高価な実験を行うことにコミットする前に,アプリケーションユーザを誘導する機械学習戦略を開発する。アプリケーション実行時間の予測により、ノード数やタイルサイズなどの最適な実行時パラメータ値を決定する。
参考スコア（独自算出の注目度）: 0.39728489102666065
License: http://creativecommons.org/licenses/by-sa/4.0/
Abstract: In this work, we develop machine learning (ML) based strategies to predict resources (costs) required for massively parallel chemistry computations, such as coupled-cluster methods, to guide application users before they commit to running expensive experiments on a supercomputer. By predicting application execution time, we determine the optimal runtime parameter values such as number of nodes and tile sizes. Two key questions of interest to users are addressed. The first is the shortest-time question, where the user is interested in knowing the parameter configurations (number of nodes and tile sizes) to achieve the shortest execution time for a given problem size and a target supercomputer. The second is the cheapest-run question in which the user is interested in minimizing resource usage, i.e., finding the number of nodes and tile size that minimizes the number of node-hours for a given problem size. We evaluate a rich family of ML models and strategies, developed based on the collections of runtime parameter values for the CCSD (Coupled Cluster with Singles and Doubles) application executed on the Department of Energy (DOE) Frontier and Aurora supercomputers. Our experiments show that when predicting the total execution time of a CCSD iteration, a Gradient Boosting (GB) ML model achieves a Mean Absolute Percentage Error (MAPE) of 0.023 and 0.073 for Aurora and Frontier, respectively. In the case where it is expensive to run experiments just to collect data points, we show that active learning can achieve a MAPE of about 0.2 with just around 450 experiments collected from Aurora and Frontier.
Abstract（参考訳）: 本研究では,スーパーコンピュータ上で高価な実験を行う前にアプリケーションユーザを誘導するために,結合クラスタ法などの大規模並列計算に必要な資源(コスト)を予測する機械学習(ML)ベースの戦略を開発する。アプリケーション実行時間の予測により、ノード数やタイルサイズなどの最適な実行時パラメータ値を決定する。ユーザへの関心に関する2つの重要な疑問に対処する。 1つ目は、ユーザがパラメータ設定(ノード数とタイルサイズ)を知って、与えられた問題サイズとターゲットスーパーコンピュータの最も短い実行時間を達成することに興味を持つ、最短の質問である。 2つ目は、ユーザがリソース使用量を最小化することに関心を持つ最も安価な問題である。我々は,エネルギー省フロンティアとオーロラのスーパーコンピュータ上で実行されるCCSD(Coupled Cluster with Singles and Doubles)アプリケーションに対して,実行時のパラメータ値のコレクションに基づいて,MLモデルと戦略の豊富なファミリーを評価した。実験により, CCSDイテレーションの総実行時間を予測する場合, グラディエントブースティング(GB) MLモデルは, オーロラとフロンティアのそれぞれ0.023と0.073の平均絶対パーセンテージ誤差(MAPE)を達成することがわかった。データポイント収集のためだけに実験を行うのに費用がかかる場合、AuroraとFrontierから収集した約450の実験で、約0.2のMAPEをアクティブラーニングで達成できることが示される。

論文の概要: Guiding Application Users via Estimation of Computational Resources for Massively Parallel Chemistry Computations

関連論文リスト