Fugu-MT 論文翻訳(概要): Can Large Language Models Predict Parallel Code Performance?

論文の概要: Can Large Language Models Predict Parallel Code Performance?

arxiv url: http://arxiv.org/abs/2505.03988v1
Date: Tue, 06 May 2025 21:41:20 GMT
ステータス: 翻訳完了
システム内更新日: 2025-05-08 19:07:35.921958
Title: Can Large Language Models Predict Parallel Code Performance?
Title（参考訳）: 大規模言語モデルは並列コードのパフォーマンスを予測できるか?
Authors: Gregory Bolet, Giorgis Georgakoudis, Harshitha Menon, Konstantinos Parasyris, Niranjan Hasabnis, Hayden Estes, Kirk W. Cameron, Gal Oren,
Abstract要約: 本稿では,Large Language Models (LLM) がハードウェアに依存しないGPU性能予測に代替的なアプローチを提供するかどうかを考察する。 LLMはRooflineモデルについて強く理解しており、明示的なプロファイリングデータを備えた場合、100%の分類精度を達成する。以上の結果から,より優れたデータセットと迅速な戦略により,LLMはHPCルーフライン解析および性能ポータビリティのための実用的なツールとなる可能性が示唆された。
参考スコア（独自算出の注目度）: 1.5221392705893568
License: http://creativecommons.org/licenses/by/4.0/
Abstract: Accurate determination of the performance of parallel GPU code typically requires execution-time profiling on target hardware -- an increasingly prohibitive step due to limited access to high-end GPUs. This paper explores whether Large Language Models (LLMs) can offer an alternative approach for GPU performance prediction without relying on hardware. We frame the problem as a roofline classification task: given the source code of a GPU kernel and the hardware specifications of a target GPU, can an LLM predict whether the GPU kernel is compute-bound or bandwidth-bound? For this study, we build a balanced dataset of 340 GPU kernels, obtained from HeCBench benchmark and written in CUDA and OpenMP, along with their ground-truth labels obtained via empirical GPU profiling. We evaluate LLMs across four scenarios: (1) with access to profiling data of the kernel source, (2) zero-shot with source code only, (3) few-shot with code and label pairs, and (4) fine-tuned on a small custom dataset. Our results show that state-of-the-art LLMs have a strong understanding of the Roofline model, achieving 100% classification accuracy when provided with explicit profiling data. We also find that reasoning-capable LLMs significantly outperform standard LLMs in zero- and few-shot settings, achieving up to 64% accuracy on GPU source codes, without profiling information. Lastly, we find that LLM fine-tuning will require much more data than what we currently have available. This work is among the first to use LLMs for source-level roofline performance prediction via classification, and illustrates their potential to guide optimization efforts when runtime profiling is infeasible. Our findings suggest that with better datasets and prompt strategies, LLMs could become practical tools for HPC performance analysis and performance portability.
Abstract（参考訳）: 並列GPUコードのパフォーマンスの正確な決定は、通常、ターゲットハードウェア上での実行時間プロファイリングを必要とする。本稿では,Large Language Models (LLM) がハードウェアに依存しないGPU性能予測に代替的なアプローチを提供するかどうかを考察する。 GPUカーネルのソースコードとターゲットGPUのハードウェア仕様を考えると、LLMはGPUカーネルが計算バウンドなのか帯域バウンドなのかを予測できるだろうか? 本研究では,HeCBenchベンチマークから得られた340のGPUカーネルと,CUDAとOpenMPで記述されたバランスの取れたデータセットと,経験的GPUプロファイリングによって得られた基盤構造ラベルを構築する。 1)カーネルソースのプロファイリングデータへのアクセス,(2)ソースコードのみによるゼロショット,(3)コードとラベルペアによる少数ショット,(4)小さなカスタムデータセットによる微調整,の4つのシナリオでLCMを評価した。以上の結果から,現状のLLMはRooflineモデルを強く理解しており,明示的なプロファイリングデータが得られると100%の分類精度が得られることがわかった。また、推論可能なLLMはゼロおよび少数ショット設定で標準LLMよりも大幅に優れており、情報をプロファイリングすることなく、GPUソースコード上で最大64%の精度を実現している。最後に、LLMの微調整には、現在利用可能なものよりもはるかに多くのデータが必要です。この研究は、ソースレベルの屋上性能予測を分類して初めてLLMを使用しており、実行時プロファイリングが不可能な場合に最適化の取り組みをガイドする可能性を示している。以上の結果から,より優れたデータセットと迅速な戦略により,LLMはHPC性能解析および性能ポータビリティのための実用的なツールとなる可能性が示唆された。

論文の概要: Can Large Language Models Predict Parallel Code Performance?

関連論文リスト