Fugu-MT 論文翻訳(概要): Black-Box Skill Stealing Attack from Proprietary LLM Agents: An Empirical Study

論文の概要: Black-Box Skill Stealing Attack from Proprietary LLM Agents: An Empirical Study

arxiv url: http://arxiv.org/abs/2604.21829v1
Date: Thu, 23 Apr 2026 16:18:47 GMT
ステータス: 翻訳完了
システム内更新日: 2026-04-24 14:40:06.761098
Title: Black-Box Skill Stealing Attack from Proprietary LLM Agents: An Empirical Study
Title（参考訳）: プロプライエタリLSMエージェントによるブラックボックススキルステアリング攻撃 : 実証的研究
Authors: Zihan Wang, Rui Zhang, Yu Liu, Chi Liu, Qingchuan Zhao, Hongwei Li, Guowen Xu,
Abstract要約: LLMエージェントシステムに対するブラックボックススキル盗難の実証的研究を行った。以上の結果から,エージェントスキルは3つのインタラクションで抽出できることが示唆された。我々はエージェントパイプラインの3つのステージ(入力、推論、出力)にまたがって防御を設計する。
参考スコア（独自算出の注目度）: 32.698841771877824
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: LLM agents increasingly rely on skills to encapsulate reusable capabilities via progressively disclosed instructions. High-quality skills inject expert knowledge into general-purpose models, improving performance on specialized tasks. This quality and ease of dissemination drive the emergence of a skill economy: free skill marketplaces already report 90368 published skills, while paid marketplaces report more than 2000 listings and over $100,000 in creator earnings. Yet this growing marketplace also creates a new attack surface, as adversaries can interact with public agent to extract hidden proprietary skill content. We present the first empirical study of black-box skill stealing against LLM agent systems. To study this threat, we first derive an attack taxonomy from prior prompt-stealing methods and build an automated stealing prompt generation agent. This agent starts from model-generated seed prompts, expands them through scenario rationalization and structure injection, and enforces diversity via embedding filtering. This process yields a reproducible pipeline for evaluating agent systems. We evaluate such attacks across 3 commercial agent architectures and 5 LLMs. Our results show that agent skills can be extracted with only 3 interactions, posing a serious copyright risk. To mitigate this threat, we design defenses across three stages of the agent pipeline: input, inference, and output. Although these defenses achieve strong results, the attack remains inexpensive and readily automatable, allowing an adversary to launch repeated attempts with different variants; only one successful attempt is sufficient to compromise the protected skill. Overall, our findings suggest that these copyright risks are largely overlooked across proprietary agent ecosystems. We therefore advocate for more robust defense strategies that provide stronger protection guarantees.
Abstract（参考訳）: LLMエージェントは、徐々に開示された指示を通じて再利用可能な能力をカプセル化する技術にますます依存している。高品質なスキルは、専門家の知識を汎用モデルに注入し、特殊タスクのパフォーマンスを改善する。無料スキルマーケットプレースはすでに90368の公開スキルを報告しており、有料マーケットプレースは2000以上のリストと10万ドル以上のクリエーターの収益を報告している。敵は公開エージェントと対話して、隠されたプロプライエタリなスキルコンテンツを抽出することができる。 LLMエージェントシステムに対するブラックボックススキルステルスに関する最初の実証的研究について述べる。この脅威を調査するために、我々はまず、事前の急速操法から攻撃分類を導出し、自動盗難防止剤を構築する。このエージェントはモデル生成のシードプロンプトから始まり、シナリオの合理化と構造注入を通じて拡張し、埋め込みフィルタリングによって多様性を強制する。このプロセスは、エージェントシステムを評価するために再現可能なパイプラインを生成する。 3つの商用エージェントアーキテクチャと5つのLLMにまたがる攻撃を評価する。以上の結果から,エージェントスキルは3つのインタラクションで抽出できることが示唆された。この脅威を軽減するために、私たちはエージェントパイプラインの3つのステージ(入力、推論、出力)にまたがる防御を設計します。これらの防御は強い結果をもたらすが、攻撃は安価で容易に自動化可能であり、敵が異なる変種で繰り返し試みることを可能にする。全体としては、これらの著作権リスクは、プロプライエタリなエージェントエコシステム全体で見過ごされていることを示唆しています。したがって我々は、より強力な保護保証を提供するより堅牢な防衛戦略を提唱する。

論文の概要: Black-Box Skill Stealing Attack from Proprietary LLM Agents: An Empirical Study

関連論文リスト