Fugu-MT 論文翻訳(概要): Trojan's Whisper: Stealthy Manipulation of OpenClaw through Injected Bootstrapped Guidance

論文の概要: Trojan's Whisper: Stealthy Manipulation of OpenClaw through Injected Bootstrapped Guidance

arxiv url: http://arxiv.org/abs/2603.19974v1
Date: Fri, 20 Mar 2026 14:17:56 GMT
ステータス: 翻訳完了
システム内更新日: 2026-03-23 19:48:39.176941
Title: Trojan's Whisper: Stealthy Manipulation of OpenClaw through Injected Bootstrapped Guidance
Title（参考訳）: TrojanのWhisper: 注入式ブートストラップ誘導によるOpenClawのステルス操作
Authors: Fazhong Liu, Zhuoyan Chen, Tu Lan, Haozhen Tan, Zhenyu Xu, Xiang Li, Guoxing Chen, Yan Meng, Haojin Zhu,
Abstract要約: ガイダンスインジェクション(Guidance Injection)は、ブートストラップのガイダンスファイルに敵の運用ストーリーを埋め込むステルス攻撃ベクターである。エクスプロイト,ワークスペース破壊,特権エスカレーション,持続的バックドア設置など,13の攻撃カテゴリにまたがる26の悪意あるスキルを構築した。我々の攻撃は16.4%から64.2%の確率で成功し、悪意のある行動の大半はユーザーの確認なしに自律的に実行される。
参考スコア（独自算出の注目度）: 23.059379933610163
License: http://creativecommons.org/licenses/by/4.0/
Abstract: Autonomous coding agents are increasingly integrated into software development workflows, offering capabilities that extend beyond code suggestion to active system interaction and environment management. OpenClaw, a representative platform in this emerging paradigm, introduces an extensible skill ecosystem that allows third-party developers to inject behavioral guidance through lifecycle hooks during agent initialization. While this design enhances automation and customization, it also opens a novel and unexplored attack surface. In this paper, we identify and systematically characterize guidance injection, a stealthy attack vector that embeds adversarial operational narratives into bootstrap guidance files. Unlike traditional prompt injection, which relies on explicit malicious instructions, guidance injection manipulates the agent's reasoning context by framing harmful actions as routine best practices. These narratives are automatically incorporated into the agent's interpretive framework and influence future task execution without raising suspicion.We construct 26 malicious skills spanning 13 attack categories including credential exfiltration, workspace destruction, privilege escalation, and persistent backdoor installation. We evaluate them using ORE-Bench, a realistic developer workspace benchmark we developed. Across 52 natural user prompts and six state-of-the-art LLM backends, our attacks achieve success rates from 16.0% to 64.2%, with the majority of malicious actions executed autonomously without user confirmation. Furthermore, 94% of our malicious skills evade detection by existing static and LLM-based scanners. Our findings reveal fundamental tensions in the design of autonomous agent ecosystems and underscore the urgent need for defenses based on capability isolation, runtime policy enforcement, and transparent guidance provenance.
Abstract（参考訳）: 自律的なコーディングエージェントはますますソフトウェア開発ワークフローに統合され、コード提案からアクティブなシステムインタラクションや環境管理まで拡張された機能を提供する。この新興パラダイムの代表的プラットフォームであるOpenClawは、サードパーティ開発者がエージェントの初期化時にライフサイクルフックを通じて行動ガイダンスを注入できる拡張可能なスキルエコシステムを導入している。この設計は自動化とカスタマイズを促進させるが、新規で未調査の攻撃面も開放する。本稿では,ブートストラップ誘導ファイルに敵の操作記述を埋め込んだステルス攻撃ベクトルである誘導インジェクションを同定し,体系的に特徴付ける。明示的な悪意のある指示に依存する従来のプロンプトインジェクションとは異なり、ガイダンスインジェクションは、通常のベストプラクティスとして有害なアクションをフレーミングすることによってエージェントの推論コンテキストを操作する。これらの物語はエージェントの解釈枠組みに自動的に組み込まれ、疑わしい疑念を起こさずにタスク実行に影響を与える。我々は、クレデンシャル・エクスプロイト、ワークスペースの破壊、特権のエスカレーション、永続的なバックドア設置を含む13の攻撃カテゴリにまたがる26の悪意あるスキルを構築した。私たちは、現実的な開発者ワークスペースベンチマークであるORE-Benchを使って、それらを評価しました。 52の自然なユーザプロンプトと6つの最先端のLDMバックエンドで、私たちの攻撃は16.0%から64.2%に成功し、悪意のあるアクションの大部分がユーザ確認なしで自律的に実行される。さらに、我々の悪意あるスキルの94%は、既存の静的およびLCMベースのスキャナーによる検出を回避している。本研究は,自律型エージェント・エコシステムの設計における基本的緊張感を明らかにし,能力分離,実行時方針強制,透明なガイダンス証明に基づく防衛の緊急的必要性を裏付けるものである。

論文の概要: Trojan's Whisper: Stealthy Manipulation of OpenClaw through Injected Bootstrapped Guidance

関連論文リスト