Fugu-MT 論文翻訳(概要): Don't Let the Claw Grip Your Hand: A Security Analysis and Defense Framework for OpenClaw

論文の概要: Don't Let the Claw Grip Your Hand: A Security Analysis and Defense Framework for OpenClaw

arxiv url: http://arxiv.org/abs/2603.10387v1
Date: Wed, 11 Mar 2026 04:09:05 GMT
ステータス: 翻訳完了
システム内更新日: 2026-03-12 16:22:32.772221
Title: Don't Let the Claw Grip Your Hand: A Security Analysis and Defense Framework for OpenClaw
Title（参考訳）: OpenClawのセキュリティ分析と防御フレームワーク
Authors: Zhengyang Shan, Jiayun Xin, Yue Zhang, Minghui Xu,
Abstract要約: 大きな言語モデルを利用したコードエージェントは、ユーザに代わってシェルコマンドを実行し、深刻なセキュリティ脆弱性を導入することができる。本稿では,OpenClawプラットフォームの2段階のセキュリティ解析について述べる。我々は,新しいHuman-in-the-Loop(HITL)防衛層を提案し,実装する。
参考スコア（独自算出の注目度）: 11.260903238043129
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Code agents powered by large language models can execute shell commands on behalf of users, introducing severe security vulnerabilities. This paper presents a two-phase security analysis of the OpenClaw platform. As an open-source AI agent framework that operates locally, OpenClaw can be integrated with various commercial large language models. Because its native architecture lacks built-in security constraints, it serves as an ideal subject for evaluating baseline agent vulnerabilities. First, we systematically evaluate OpenClaw's native resilience against malicious instructions. By testing 47 adversarial scenarios across six major attack categories derived from the MITRE ATLAS and ATT\&CK frameworks, we have demonstrated that OpenClaw exhibits significant inherent security issues. It primarily relies on the security capabilities of the backend LLM and is highly susceptible to sandbox escape attacks, with an average defense rate of only 17\%. To mitigate these critical security gaps, we propose and implement a novel Human-in-the-Loop (HITL) defense layer. We utilize a dual-mode testing framework to evaluate the system with and without our proposed intervention. Our findings show that the introduced HITL layer significantly hardens the system, successfully intercepting up to 8 severe attacks that completely bypassed OpenClaw's native defenses. By combining native capabilities with our HITL approach, the overall defense rate improves to a range of 19\% to 92\%. Our study not only exposes the intrinsic limitations of current code agents but also demonstrates the effectiveness of human-agent collaborative defense strategies.
Abstract（参考訳）: 大きな言語モデルを利用したコードエージェントは、ユーザに代わってシェルコマンドを実行し、深刻なセキュリティ脆弱性を導入することができる。本稿では,OpenClawプラットフォームの2段階のセキュリティ解析について述べる。ローカルに動作するオープンソースのAIエージェントフレームワークとして、OpenClawはさまざまな商用大規模言語モデルと統合することができる。ネイティブアーキテクチャにはセキュリティ上の制約が組み込まれていないため、ベースラインエージェントの脆弱性を評価する上で理想的な対象として機能する。まず、悪意のある命令に対するOpenClawのネイティブレジリエンスを体系的に評価する。 MITRE ATLASとATT\&CKフレームワークから派生した6つの主要な攻撃カテゴリで47の敵シナリオをテストすることで、OpenClawが重大なセキュリティ問題を示すことを示した。主にバックエンドのLLMのセキュリティ機能に依存しており、サンドボックスのエスケープ攻撃の影響を受けやすい。これらの重要なセキュリティギャップを軽減するため、我々は新しいHuman-in-the-Loop(HITL)防衛層を提案し、実装する。我々は,提案した介入を伴わずにシステムを評価するために,デュアルモードテストフレームワークを利用する。以上の結果から,HITL層はシステムを大幅に強化し,OpenClawのネイティブディフェンスを完全にバイパスする8つの攻撃をインターセプトすることに成功した。ネイティブ機能とHITLアプローチを組み合わせることで、全体的な防御率は19\%から92\%の範囲に向上する。本研究は,現行のコードエージェントの本質的な限界を明らかにするだけでなく,人間とエージェントの協調防衛戦略の有効性を実証する。

論文の概要: Don't Let the Claw Grip Your Hand: A Security Analysis and Defense Framework for OpenClaw

関連論文リスト