Fugu-MT 論文翻訳(概要): AutoTriton: Automatic Triton Programming with Reinforcement Learning in LLMs

論文の概要: AutoTriton: Automatic Triton Programming with Reinforcement Learning in LLMs

arxiv url: http://arxiv.org/abs/2507.05687v1
Date: Tue, 08 Jul 2025 05:38:24 GMT
ステータス: 翻訳完了
システム内更新日: 2025-07-09 16:34:37.634804
Title: AutoTriton: Automatic Triton Programming with Reinforcement Learning in LLMs
Title（参考訳）: AutoTriton: LLMにおける強化学習による自動トリトンプログラミング
Authors: Shangzhan Li, Zefan Wang, Ye He, Yuxuan Li, Qi Shi, Jianling Li, Yonggang Hu, Wanxiang Che, Xu Han, Zhiyuan Liu, Maosong Sun,
Abstract要約: 我々は、強化学習(RL)を利用したトリトンプログラミングのための最初のモデルであるAutoTritonを紹介する。 AutoTritonは、高品質なデータ収集パイプラインを使用して、本質的なTritonプログラミング専門知識を備えた教師付き微調整(SFT)を実行する。 TritonBenchとKernelBenchの5つの評価チャネルでの実験は、我々の8BモデルAutoTritonがメインストリームの大規模モデルに匹敵するパフォーマンスを実現していることを示している。
参考スコア（独自算出の注目度）: 87.8306870967343
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Kernel development in deep learning requires optimizing computational units across hardware while balancing memory management, parallelism, and hardware-specific optimizations through extensive empirical tuning. Although domain-specific languages like Triton simplify GPU programming by abstracting low-level details, developers must still manually tune critical parameters such as tile sizes and memory access patterns through iterative experimentation, creating substantial barriers to optimal performance and wider adoption. In this work, we introduce AutoTriton, the first model dedicated to Triton programming powered by reinforcement learning (RL). AutoTriton performs supervised fine-tuning (SFT) to be equipped with essential Triton programming expertise using a high-quality data gathering pipeline, and conducts RL with Group Relative Policy Optimization (GRPO) algorithm, combining a rule-based reward and an execution-based reward to further improve Triton programming ability, sequentially. Experiments across five evaluation channels of TritonBench and KernelBench illustrate that our 8B model AutoTriton achieves performance comparable to mainstream large models, including Claude-4-Sonnet and DeepSeek-R1-0528. Further experimental analysis demonstrates the crucial role of each module within AutoTriton, including the SFT stage, the RL stage, and the reward design strategy. These findings underscore the promise of RL for automatically generating high-performance kernels, and since high-performance kernels are core components of AI systems, this breakthrough establishes an important foundation for building more efficient AI systems. The model and code will be available at https://github.com/AI9Stars/AutoTriton.
Abstract（参考訳）: ディープラーニングにおけるカーネル開発では、メモリ管理、並列処理、ハードウェア固有の最適化のバランスを保ちながら、ハードウェア全体の計算ユニットを最適化する必要がある。 Tritonのようなドメイン固有言語は、低レベルの詳細を抽象化することでGPUプログラミングを単純化するが、開発者は反復的な実験を通じてタイルサイズやメモリアクセスパターンなどの重要なパラメータを手作業で調整する必要がある。本稿では,強化学習(RL)を利用したトリトンプログラミングのための最初のモデルであるAutoTritonを紹介する。 AutoTritonは、高品質なデータ収集パイプラインを使用して、重要なトリトンプログラミング専門知識を備えるための教師付き微調整(SFT)を行い、ルールベースの報酬と実行ベースの報酬を組み合わせて、順次トリトンプログラミング能力を向上するグループ相対ポリシー最適化(GRPO)アルゴリズムを用いてRLを実行する。 TritonBenchとKernelBenchの5つの評価チャネルでの実験では、私たちの8BモデルAutoTritonが、Claude-4-SonnetやDeepSeek-R1-0528といった主流の大規模モデルに匹敵するパフォーマンスを実現している。さらに実験的に、SFTステージ、RLステージ、報酬設計戦略を含むAutoTritonの各モジュールが重要な役割を担っていることを示す。これらの発見は、高性能カーネルを自動生成するRLの約束を強調しており、ハイパフォーマンスカーネルはAIシステムの中核的なコンポーネントであるため、このブレークスルーはより効率的なAIシステムを構築するための重要な基盤を確立している。モデルとコードはhttps://github.com/AI9Stars/AutoTriton.comから入手できる。

論文の概要: AutoTriton: Automatic Triton Programming with Reinforcement Learning in LLMs

関連論文リスト