Fugu-MT 論文翻訳(概要): From Large to Small: Transferring CUDA Optimization Expertise via Reasoning Graph

論文の概要: From Large to Small: Transferring CUDA Optimization Expertise via Reasoning Graph

arxiv url: http://arxiv.org/abs/2510.19873v1
Date: Wed, 22 Oct 2025 08:33:44 GMT
ステータス: 翻訳完了
システム内更新日: 2025-10-25 03:08:16.431371
Title: From Large to Small: Transferring CUDA Optimization Expertise via Reasoning Graph
Title（参考訳）: 大規模から小規模へ:Reasoning GraphによるCUDA最適化エキスパートの移行
Authors: Junfeng Gong, Zhiyi Wei, Junying Chen, Cheng Liu, Huawei Li,
Abstract要約: 大規模言語モデル(LLM)は、シーケンシャルコードから最適化されたコードを生成する強力な可能性を示している。クラウドベースのAPIはコード漏洩のリスクを生じさせ、ローカルデプロイメントは計算コストが高く非効率であることが多い。これらの欠点は、より軽量でプライバシーに優しい小言語モデル(SLM)への関心を喚起している。
参考スコア（独自算出の注目度）: 12.73098983668479
License: http://creativecommons.org/licenses/by/4.0/
Abstract: Despite significant evolution of CUDA programming and domain-specific libraries, effectively utilizing GPUs with massively parallel engines remains difficult. Large language models (LLMs) show strong potential in generating optimized CUDA code from sequential code. However, using LLMs in practice faces two major challenges: cloud-based APIs pose risks of code leakage, and local deployment is often computationally expensive and inefficient. These drawbacks have spurred interest in small language models (SLMs), which are more lightweight and privacy-friendly. Encouragingly, recent studies show that SLMs can achieve performance comparable to LLMs on specific tasks. While SLMs can match LLMs on domain-specific tasks, their limited reasoning abilities lead to suboptimal performance in complex CUDA generation according to our experiments. To bridge this gap, we propose ReGraphT, a training-free, retrieval-augmented generation framework that transfers LLM-level reasoning to smaller models. ReGraphT organizes CUDA optimization trajectories into a structured reasoning graph, modeling the combined CUDA optimizations as state transitions, and leverages Monte Carlo Graph Search (MCGS) for efficient exploration. We also present a CUDA-specific benchmark with difficulty tiers defined by reasoning complexity to evaluate models more comprehensively. Experiments show that ReGraphT outperforms HPC-specific fine-tuned models and other retrieval-augmented approaches, achieving an average 2.33X speedup on CUDAEval and ParEval. When paired with DeepSeek-Coder-V2-Lite-Instruct and Qwen2.5-Coder-7B-Instruct, ReGraphT enables SLMs to approach LLM-level performance without the associated privacy risks or excessive computing overhead.
Abstract（参考訳）: CUDAプログラミングとドメイン固有ライブラリの大幅な進化にもかかわらず、GPUを大規模並列エンジンで効果的に活用することは依然として困難である。大規模言語モデル(LLM)は、逐次コードから最適化されたCUDAコードを生成する強力な可能性を示している。クラウドベースのAPIはコード漏洩のリスクを生じさせ、ローカルデプロイメントは計算コストが高く非効率であることが多い。これらの欠点は、より軽量でプライバシーに優しい小言語モデル(SLM)への関心を喚起している。最近の研究では、SLMが特定のタスクにおいてLLMに匹敵する性能を達成できることが示されている。 SLMはドメイン固有のタスクでLLMにマッチするが、その限られた推論能力は複雑なCUDA生成において最適以下の性能をもたらす。このギャップを埋めるために、LLMレベルの推論をより小さなモデルに転送する、トレーニング不要で検索強化された生成フレームワークであるReGraphTを提案する。 ReGraphTはCUDA最適化トラジェクトリを構造化推論グラフに整理し、統合されたCUDA最適化を状態遷移としてモデル化し、効率的な探索にMCGS(Monte Carlo Graph Search)を利用する。また、より包括的にモデルを評価するために、複雑な推論によって定義される難易度の高いCUDA固有のベンチマークも提示する。実験により、ReGraphTはHPC固有の細調整されたモデルや他の検索拡張されたアプローチよりも優れており、CUDAEvalとParEvalで平均2.33倍のスピードアップを達成した。 DeepSeek-Coder-V2-Lite-Instruct と Qwen2.5-Coder-7B-Instruct と組み合わせることで、ReGraphT は SLM が LLM レベルのパフォーマンスにアプローチすることを可能にする。

論文の概要: From Large to Small: Transferring CUDA Optimization Expertise via Reasoning Graph

関連論文リスト