Fugu-MT 論文翻訳(概要): MaxCode: A Max-Reward Reinforcement Learning Framework for Automated Code Optimization

論文の概要: MaxCode: A Max-Reward Reinforcement Learning Framework for Automated Code Optimization

arxiv url: http://arxiv.org/abs/2601.05475v1
Date: Fri, 09 Jan 2026 02:21:28 GMT
ステータス: 翻訳完了
システム内更新日: 2026-01-12 17:41:49.813441
Title: MaxCode: A Max-Reward Reinforcement Learning Framework for Automated Code Optimization
Title（参考訳）: MaxCode: 自動コード最適化のためのMax-Reward強化学習フレームワーク
Authors: Jiefu Ou, Sapana Chaudhary, Kaj Bostrom, Nathaniel Weir, Shuai Zhang, Huzefa Rangwala, George Karypis,
Abstract要約: 大きな言語モデル(LLM)は、一般的なコーディングタスクにおいて強力な能力を示すが、コードの最適化には2つの重要な課題に直面する。本研究では,LLMがよりよい解を見つけるための推論時間探索アルゴリズムについて検討する。 MaxCodeと呼ばれる我々の手法は、マックス・リワード強化学習フレームワークの下で既存の検索方法を統一する。
参考スコア（独自算出の注目度）: 44.27213441671799
License: http://creativecommons.org/licenses/by/4.0/
Abstract: Large Language Models (LLMs) demonstrate strong capabilities in general coding tasks but encounter two key challenges when optimizing code: (i) the complexity of writing optimized code (such as performant CUDA kernels and competition-level CPU code) requires expertise in systems, algorithms and specific languages and (ii) requires interpretation of performance metrics like timing and device utilization beyond binary correctness. In this work, we explore inference-time search algorithms that guide the LLM to discover better solutions through iterative refinement based on execution feedback. Our approach, called MaxCode unifies existing search methods under a max-reward reinforcement learning framework, making the observation and action-value functions modular for modification. To enhance the observation space, we integrate a natural language critique model that converts raw execution feedback into diagnostic insights about errors and performance bottlenecks, and the best-discounted reward seen so far. Together, these provide richer input to the code proposal function. To improve exploration during search, we train a generative reward-to-go model using action values from rollouts to rerank potential solutions. Testing on the KernelBench (CUDA) and PIE (C++) optimization benchmarks shows that MaxCode improves optimized code performance compared to baselines, achieving 20.3% and 10.1% relative improvements in absolute speedup value and relative speedup ranking, respectively.
Abstract（参考訳）: 大きな言語モデル(LLM)は、一般的なコーディングタスクにおいて強力な能力を示すが、コードの最適化には2つの重要な課題に直面する。 i)最適化されたコードを書く複雑さ(性能の高いCUDAカーネルや競合レベルのCPUコードなど)には、システム、アルゴリズム、特定の言語に関する専門知識が必要である。 (ii) タイミングやデバイス利用などのパフォーマンス指標をバイナリの正確性を超えて解釈する必要がある。本研究では, LLM を誘導する推論時間探索アルゴリズムを探索し, 実行フィードバックに基づく反復的改善によるより良い解の探索を行う。 MaxCodeと呼ばれる我々の手法は、最大逆強化学習フレームワークの下で既存の探索手法を統一し、観測関数と行動値関数をモジュール化して修正する。観察空間を強化するため,本研究では,生の実行フィードバックを誤りや性能ボトルネックに関する診断的洞察に変換する自然言語批判モデルを統合する。これらは共に、コード提案機能へのより豊富なインプットを提供します。探索中の探索を改善するために,ロールアウトからのアクション値を用いて生成的な報奨モデルを構築し,潜在的な解を再現する。 KernelBench (CUDA) と PIE (C++) の最適化ベンチマークをテストすると、MaxCode はベースラインよりも最適化されたコード性能を改善し、それぞれ絶対スピードアップ値と相対スピードアップランキングの20.3%と10.1%を達成している。

論文の概要: MaxCode: A Max-Reward Reinforcement Learning Framework for Automated Code Optimization

関連論文リスト