Fugu-MT 論文翻訳(概要): From Human Guidance to Autonomy: Agent Skill System for End-to-End LLM Deployment on Spatial NPUs

論文の概要: From Human Guidance to Autonomy: Agent Skill System for End-to-End LLM Deployment on Spatial NPUs

arxiv url: http://arxiv.org/abs/2606.07586v2
Date: Tue, 09 Jun 2026 13:49:01 GMT
ステータス: 翻訳完了
システム内更新日: 2026-06-15 07:09:36.747574
Title: From Human Guidance to Autonomy: Agent Skill System for End-to-End LLM Deployment on Spatial NPUs
Title（参考訳）: 人間指導から自律へ:空間的NPUへのLLM展開のためのエージェントスキルシステム
Authors: Jiajie Li, Erwei Wang, Zhiru Zhang, Samuel Bayliss,
Abstract要約: 本稿では,AMD XDNA 2 NPUを用いた2段階の手法を提案する。第1段階では、Llama-3.2-1Bの基準展開を人為的なエージェント支援によって行う。その結果、プリフィルで2.2倍、手動最適化ベースライン上でデコードで4.0倍のスピードアップを実現した。エージェントスキルシステムを使用することで,さらに8つのデコーダのみのLLMを自律的にデプロイする。
参考スコア（独自算出の注目度）: 8.565916665783307
License: http://creativecommons.org/licenses/by/4.0/
Abstract: Spatial neural processing units (NPUs) provide an energy-efficient platform for edge LLM inference, but efficiently deploying an LLM end-to-end on such hardware remains labor-intensive. Although AI coding agents have begun to lower this cost, existing studies have largely focused on single-kernel optimization rather than end-to-end LLM deployment on resource-constrained spatial NPUs. We present a two-stage methodology, instantiated on the AMD XDNA 2 NPU, that progresses from human-guided development to agent autonomy. In the first stage, we develop a reference deployment of Llama-3.2-1B through human-guided agent assistance. The resulting implementation achieves a speedup of 2.2x on prefill and 4.0x on decode over the hand-optimized baseline, with the optimization trajectory and its lessons recorded as structured documentation throughout. In the second stage, we distill the documentation into an agent skill system consisting of eight phases, orchestrating the optimization and debugging skill sets, with numerical correctness strictly enforced at each phase. Using our agent skill system, we autonomously deploy eight additional decoder-only LLMs (Llama-3.2-3B, SmolLM2-1.7B, Qwen2.5-{0.5B, 1.5B, 3B}, Qwen3-{0.6B, 1.7B, 4B}) end-to-end on the AMD XDNA 2 NPU using the open-source compiler stack. To our knowledge, these models have not previously been deployed on AMD NPUs via any open-source software stack. Each deployment completes in 0.5-4 hours of agent wall time with almost no human guidance, and passes the numerical-correctness gates, demonstrating functional generalization to previously unencountered LLMs. Three of the eight match or exceed the sustained performance of our Llama-3.2-1B reference deployment, suggesting that the resulting implementations can be competitive without additional model-specific human engineering.
Abstract（参考訳）: 空間ニューラルプロセッシングユニット(NPU)はエッジLLM推論のためのエネルギー効率のよいプラットフォームを提供するが、そのようなハードウェア上でLLMをエンドツーエンドに効率的にデプロイすることは、労働集約的のままである。 AIコーディングエージェントはこのコストを削減し始めているが、既存の研究はリソース制約された空間的NPUにエンドツーエンドのLLMを配置するよりも、シングルカーネル最適化に重点を置いている。本稿では,AMD XDNA 2 NPUを用いた2段階の手法を提案する。第1段階では、Llama-3.2-1Bの基準展開を人為的なエージェント支援によって行う。その結果、プリフィルで2.2倍、手動最適化ベースライン上でのデコードで4.0倍のスピードアップを実現し、最適化軌道とそのレッスンを構造化ドキュメントとして記録した。第2段階では、文書を8つのフェーズからなるエージェントスキルシステムに蒸留し、最適化とデバッグのスキルセットを編成し、各フェーズに厳密な数値的正確性を持たせる。エージェントスキルシステムを用いて,オープンソースコンパイラスタックを用いて,AMD XDNA 2 NPU上に8つのデコーダのみのLCM(Llama-3.2-3B, SmolLM2-1.7B, Qwen2.5-{0.5B, 1.5B, 3B}, Qwen3-{0.6B, 1.7B, 4B})を自動デプロイする。我々の知る限り、これらのモデルは今までもオープンソースソフトウェアスタックを通じてAMD NPUにデプロイされていませんでした。各配備は、人的ガイダンスがほとんどなく、0.5-4時間のエージェントウォールタイムで完了し、数値的正確性ゲートを通過し、以前は未公表のLCMに関数的一般化を示す。 8つのうち3つがLlama-3.2-1B参照デプロイメントの持続的な性能と一致しているか、あるいは超えているため、結果として得られた実装は、追加のモデル固有のヒューマンエンジニアリングなしで競争可能であることが示唆されている。

論文の概要: From Human Guidance to Autonomy: Agent Skill System for End-to-End LLM Deployment on Spatial NPUs

関連論文リスト