Fugu-MT 論文翻訳(概要): Black-Box Skill Stealing Attack from Proprietary LLM Agents: An Empirical Study

論文の概要: Black-Box Skill Stealing Attack from Proprietary LLM Agents: An Empirical Study

arxiv url: http://arxiv.org/abs/2604.21829v2
Date: Mon, 27 Apr 2026 15:06:33 GMT
ステータス: 翻訳完了
システム内更新日: 2026-04-28 17:12:06.9403
Title: Black-Box Skill Stealing Attack from Proprietary LLM Agents: An Empirical Study
Title（参考訳）: プロプライエタリLSMエージェントによるブラックボックススキルステアリング攻撃 : 実証的研究
Authors: Zihan Wang, Rui Zhang, Yu Liu, Chi Liu, Qingchuan Zhao, Hongwei Li, Guowen Xu,
Abstract要約: 大規模言語モデル(LLM)エージェントは、インストラクション、ツール、リソースを通じて再利用可能な機能をパッケージするスキルに依存している。高品質なスキルは、専門家の知識、キュレーション、実行の制約をエージェントに埋め込む。敵は公開エージェントインターフェースと対話して、隠されたプロプライエタリなスキルコンテンツを抽出できる。
参考スコア（独自算出の注目度）: 32.698841771877824
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Large language model (LLM) agents increasingly rely on skills to package reusable capabilities through instructions, tools, and resources. High-quality skills embed expert knowledge, curated workflows, and execution constraints into agents, fueling a growing skill economy through their value and scalability. Yet this ecosystem also creates a new attack surface, as adversaries can interact with public agent interfaces to extract hidden proprietary skill content. We present the first systematic study of black-box skill stealing against LLM agent systems. Compared with conventional system prompt stealing, skill stealing targets modular and structured capability packages whose leakage is directly actionable for copying, redistribution, and monetization, making the resulting harm potentially greater. To study this threat, we derive an attack taxonomy from prior prompt-stealing methods and build an automated stealing prompt generation agent. Starting from model-generated seed prompts, the framework expands attacks through scenario rationalization and structure injection while enforcing diversity via embedding-based filtering, yielding a reproducible pipeline for evaluating proprietary agent systems. We evaluate these attacks across commercial agent platforms and representative LLMs. Our results show that agent skills can often be extracted easily, posing a serious copyright risk. To mitigate this threat, we design defenses across the agent pipeline, focusing on input, inference, and output phase. Although these defenses substantially reduce leakage, the attack remains inexpensive and repeatable, and a single successful attempt is sufficient to compromise the protected skill. Overall, our findings suggest that these copyright risks remain largely overlooked across proprietary agent ecosystems, motivating stronger protection mechanisms.
Abstract（参考訳）: 大規模言語モデル(LLM)エージェントは、インストラクション、ツール、リソースを通じて再利用可能な機能をパッケージするスキルにますます依存している。高品質なスキルは、専門家の知識、キュレートされたワークフロー、実行制約をエージェントに組み込んで、その価値とスケーラビリティを通じてスキル経済を成長させます。しかし、このエコシステムは、敵が公開エージェントインターフェースと対話して、隠されたプロプライエタリなスキルコンテンツを抽出する、新たな攻撃面も生成する。 LLMエージェントシステムに対するブラックボックススキルステリングに関する最初の体系的研究について述べる。従来のシステム・プロンプト・ステーリングと比較すると, 複製, 再配布, 収益化に直接動作可能な, モジュラーおよび構造化機能パッケージを標的としたスキル・ステーティングは, 潜在的に大きな害をもたらす。この脅威を調査するために,攻撃分類法を従来手法から導出し,自動盗難防止剤を構築する。モデル生成のシードプロンプトから始めて、このフレームワークは、シナリオ合理化と構造注入を通じて攻撃を拡張し、埋め込みベースのフィルタリングによって多様性を強制し、プロプライエタリなエージェントシステムを評価するための再現可能なパイプラインを生成する。我々は、これらの攻撃を商用エージェントプラットフォームおよび代表LSMにわたって評価する。以上の結果から,エージェントスキルの抽出は容易であり,重大な著作権リスクが生じることが示唆された。この脅威を軽減するため、私たちはエージェントパイプライン全体の防御を設計し、入力、推論、出力フェーズに重点を置いています。これらの防御は漏れを大幅に減少させるが、攻撃は安価で繰り返し可能であり、単一の試みは防御されたスキルを損なうのに十分である。全体としては、これらの著作権リスクは、プロプライエタリなエージェントエコシステム全体で見過ごされているままであり、より強力な保護メカニズムを動機付けていることを示唆している。

論文の概要: Black-Box Skill Stealing Attack from Proprietary LLM Agents: An Empirical Study

関連論文リスト