Fugu-MT 論文翻訳(概要): Towards Cold-Start Drafting and Continual Refining: A Value-Driven Memory Approach with Application to NPU Kernel Synthesis

論文の概要: Towards Cold-Start Drafting and Continual Refining: A Value-Driven Memory Approach with Application to NPU Kernel Synthesis

arxiv url: http://arxiv.org/abs/2603.10846v1
Date: Wed, 11 Mar 2026 14:57:06 GMT
ステータス: 翻訳完了
システム内更新日: 2026-03-12 16:22:33.010769
Title: Towards Cold-Start Drafting and Continual Refining: A Value-Driven Memory Approach with Application to NPU Kernel Synthesis
Title（参考訳）: コールドスタートドラフトと連続精製に向けて:NPUカーネル合成への価値駆動メモリアプローチ
Authors: Yujie Zheng, Zhuo Li, Shengtao Zhang, Hanjing Wang, Junjie Sheng, Jiaqian Wang, Junchi Yan, Weinan Zhang, Ying Wen, Bo Tang, Muning Wen,
Abstract要約: EvoKernelは、カーネル合成のライフサイクルを自動化する自己進化型エージェントフレームワークである。ステージ固有のQ値を学び、現在の目標への貢献に基づいて経験を優先する。モデルの正しさを11.0%から83.0%に改善し、初期ドラフトよりも3.60倍のスピードアップを実現している。
参考スコア（独自算出の注目度）: 68.7701048879757
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Deploying Large Language Models to data-scarce programming domains poses significant challenges, particularly for kernel synthesis on emerging Domain-Specific Architectures where a "Data Wall" limits available training data. While models excel on data-rich platforms like CUDA, they suffer catastrophic performance drops on data-scarce ecosystems such as NPU programming. To overcome this cold-start barrier without expensive fine-tuning, we introduce EvoKernel, a self-evolving agentic framework that automates the lifecycle of kernel synthesis from initial drafting to continual refining. EvoKernel addresses this by formulating the synthesis process as a memory-based reinforcement learning task. Through a novel value-driven retrieval mechanism, it learns stage-specific Q-values that prioritize experiences based on their contribution to the current objective, whether bootstrapping a feasible draft or iteratively refining latency. Furthermore, by enabling cross-task memory sharing, the agent generalizes insights from simple to complex operators. By building an NPU variant of KernelBench and evaluating on it, EvoKernel improves frontier models' correctness from 11.0% to 83.0% and achieves a median speedup of 3.60x over initial drafts through iterative refinement. This demonstrates that value-guided experience accumulation allows general-purpose models to master the kernel synthesis task on niche hardware ecosystems. Our official page is available at https://evokernel.zhuo.li.
Abstract（参考訳）: 大規模言語モデルをデータスカースプログラミングドメインにデプロイすることは、特に"Data Wall"が利用可能なトレーニングデータを制限している新興のDomain-Specific Architectures上でのカーネル合成において、大きな課題となる。 CUDAのようなデータ豊富なプラットフォームではモデルが優れているが、NPUプログラミングのようなデータ共有エコシステムでは壊滅的なパフォーマンス低下が発生している。このコールドスタート障壁を、高価な微調整なしで克服するために、初期ドラフトから連続精製までのカーネル合成のライフサイクルを自動化する自己進化型エージェントフレームワークであるEvoKernelを紹介します。 EvoKernelは、合成プロセスをメモリベースの強化学習タスクとして定式化することで、この問題に対処する。新たな価値駆動型検索機構を通じて、実行可能なドラフトをブートストラップするか、あるいは繰り返し精錬するかに関わらず、現在の目標への貢献に基づいて、経験を優先するステージ固有のQ値を学ぶ。さらに、クロスタスクメモリ共有を有効にすることで、エージェントは単純な演算子から複雑な演算子への洞察を一般化する。 KernelBench の NPU 版を構築して評価することにより、EvoKernel はフロンティアモデルの正しさを 11.0% から 83.0% に改善し、反復的な改善によって初期ドラフトの3.60倍の中央値のスピードアップを達成する。これは、価値誘導エクスペリエンスの蓄積により、ニッチなハードウェアエコシステム上でのカーネル合成タスクを汎用モデルでマスターできることを実証する。公式ページはhttps://evokernel.zhuo.li.comで公開されている。

論文の概要: Towards Cold-Start Drafting and Continual Refining: A Value-Driven Memory Approach with Application to NPU Kernel Synthesis

関連論文リスト