Fugu-MT 論文翻訳(概要): OptiML: An End-to-End Framework for Program Synthesis and CUDA Kernel Optimization

論文の概要: OptiML: An End-to-End Framework for Program Synthesis and CUDA Kernel Optimization

arxiv url: http://arxiv.org/abs/2602.12305v1
Date: Thu, 12 Feb 2026 04:50:19 GMT
ステータス: 翻訳完了
システム内更新日: 2026-02-16 23:37:53.701224
Title: OptiML: An End-to-End Framework for Program Synthesis and CUDA Kernel Optimization
Title（参考訳）: OptiML: プログラム合成とCUDAカーネル最適化のためのエンドツーエンドフレームワーク
Authors: Arijit Bhattacharjee, Heng Ping, Son Vu Le, Paul Bogdan, Nesreen K. Ahmed, Ali Jannesari,
Abstract要約: 我々は、自然言語インテントまたは入力コードをパフォーマンス最適化カーネルにマッピングするエンドツーエンドフレームワークOptiMLを提案する。検索ベース(OptiML-X)は、プロファイラのフィードバックから導かれるハードウェア駆動の報酬によって導かれる、LCM対応のMonte Carlo Tree Searchを用いて、合成されたカーネルまたはユーザが提供するカーネルを洗練する。
参考スコア（独自算出の注目度）: 21.882017397032964
License: http://creativecommons.org/licenses/by/4.0/
Abstract: Generating high-performance CUDA kernels remains challenging due to the need to navigate a combinatorial space of low-level transformations under noisy and expensive hardware feedback. Although large language models can synthesize functionally correct CUDA code, achieving competitive performance requires systematic exploration and verification of optimization choices. We present OptiML, an end-to-end framework that maps either natural-language intent or input CUDA code to performance-optimized CUDA kernels by formulating kernel optimization as search under verification. OptiML consists of two decoupled stages. When the input is natural language, a Mixture-of-Thoughts generator (OptiML-G) acts as a proposal policy over kernel implementation strategies, producing an initial executable program. A search-based optimizer (OptiML-X) then refines either synthesized or user-provided kernels using Monte Carlo Tree Search over LLM-driven edits, guided by a hardware-aware reward derived from profiler feedback. Each candidate transformation is compiled, verified, and profiled with Nsight Compute, and evaluated by a composite objective that combines runtime with hardware bottleneck proxies and guardrails against regressions. We evaluate OptiML in both synthesis-and-optimize and optimization-only settings on a diverse suite of CUDA kernels. Results show that OptiML consistently discovers verified performance improvements over strong LLM baselines and produces interpretable optimization trajectories grounded in profiler evidence.
Abstract（参考訳）: 高性能なCUDAカーネルの生成は、ノイズの多い高価なハードウェアフィードバックの下で低レベルの変換の組合せ空間をナビゲートする必要があるため、依然として困難である。大規模言語モデルは機能的に正しいCUDAコードを合成できるが、競合性能を達成するには体系的な探索と最適化の選択の検証が必要である。提案するOptiMLは、自然言語インテントまたは入力CUDAコードを性能最適化されたCUDAカーネルにマッピングし、カーネル最適化を検証対象の検索として定式化することで、エンドツーエンドのフレームワークである。 OptiMLは2つの分離ステージから構成される。入力が自然言語の場合、OptiML-G (Mixture-of-Thoughts Generator) はカーネルの実装戦略に対する提案ポリシーとして機能し、初期実行プログラムを生成する。次に、検索ベースのオプティマイザ(OptiML-X)は、プロファイラフィードバックからハードウェアに認識された報酬によってガイドされる、LCM駆動の編集よりもモンテカルロツリーサーチを用いて、合成されたカーネルまたはユーザが提供するカーネルを洗練する。各候補変換は、Nsight Computeでコンパイル、検証、プロファイルされ、ランタイムとハードウェアボトルネックプロキシとレグレッションに対するガードレールを組み合わせた複合目的によって評価される。我々は、CUDAカーネルの多種多様なスイート上で、OptiMLを合成と最適化と最適化のみの設定の両方で評価する。その結果,OptiMLは強力なLCMベースライン上での検証性能の向上を一貫して発見し,プロファイラエビデンスに基づく解釈可能な最適化トラジェクトリを生成することがわかった。

論文の概要: OptiML: An End-to-End Framework for Program Synthesis and CUDA Kernel Optimization

関連論文リスト