Fugu-MT 論文翻訳(概要): AgentHijack: Benchmarking Computer Use Agent Robustness to Common Environment Corruptions

論文の概要: AgentHijack: Benchmarking Computer Use Agent Robustness to Common Environment Corruptions

arxiv url: http://arxiv.org/abs/2605.25707v1
Date: Mon, 25 May 2026 11:09:22 GMT
ステータス: 翻訳完了
システム内更新日: 2026-05-26 19:50:19.826737
Title: AgentHijack: Benchmarking Computer Use Agent Robustness to Common Environment Corruptions
Title（参考訳）: AgentHijack: 一般的な環境破壊に対するコンピュータ使用エージェントのロバストネスのベンチマーク
Authors: Jingwei Sun, Jianing Zhu, Yuanyi Li, Tongliang Liu, Xia HU, Bo Han,
Abstract要約: 我々は、一般的な汚職下でのコンピュータ利用エージェントの堅牢性を評価するために設計されたベンチマークであるAgentHijackを紹介する。 MLLMをベースとした各種デスクトップタスクを評価し, 汚職の小さな事例であっても, 大幅な性能劣化が生じることを確認した。本稿では,動作の要約と環境チェックに責任を負う見物人として,アクションジェネレータと接地機能を統合したフレームワークであるAgent Hijack-Agentを提案する。
参考スコア（独自算出の注目度）: 78.49000936275773
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Autonomous computer use agents that powered by multimodal large language models (MLLMs) are emerging as capable assistants for completing complex digital workflows. However, real-world execution environments are far from ideal: pop-ups, resolution changes, and competing applications frequently interfere with agent perception and control. We introduce AgentHijack, a benchmark designed to evaluate the robustness of computer-use agents under common corruptions, where the uncertainties in dynamic environment disrupt the execution flow without direct adversarial intent. Specifically, AgentHijack introduces 9 configurable common corruptions to replicate realistic imperfect scenarios. We evaluate a variety of desktop tasks that utilize MLLM-based agents and discover that even minor instances of corruption can result in substantial performance degradation, which emphasizes the fragility of agents and underscores the necessity of robustness evaluation. Afterward, we propose AgentHijack-Agent, a framework that integrates an action generator with enhanced grounding capabilities and an onlooker responsible for behavior summarization and environment checking. Extensive experiments validate its effectiveness. Our code, environment, baseline models and data are publicly available at: https://AgentHijack.github.io.
Abstract（参考訳）: マルチモーダル大言語モデル(MLLM)を利用した自律型コンピュータ利用エージェントが、複雑なデジタルワークフローを完了するための有能なアシスタントとして登場している。しかし、実際の実行環境は理想的なものではない。ポップアップ、解像度の変更、競合するアプリケーションはエージェントの認識と制御に頻繁に干渉する。本稿では,コンピュータ利用エージェントの汚職時の堅牢性を評価するためのベンチマークであるAgenHijackを紹介し,動的環境の不確実性が直接の敵意を示さずに実行フローを妨害する。具体的には、AgentHijackは、現実的な不完全なシナリオを再現するために、9つの設定可能な共通汚職を導入している。 MLLMをベースとした各種デスクトップタスクの評価を行い, 汚職事例であっても, エージェントの脆弱性を重視し, 堅牢性評価の必要性を浮き彫りにし, 大幅な性能劣化を招きかねないことを確かめる。その後,動作の要約と環境チェックに責任を持つ見物人とともに,作用発生器と接地機能を統合したフレームワークであるAgentHijack-Agentを提案する。大規模な実験は、その有効性を検証する。私たちのコード、環境、ベースラインモデル、データは、https://AgentHijack.github.io.comで公開されています。

論文の概要: AgentHijack: Benchmarking Computer Use Agent Robustness to Common Environment Corruptions

関連論文リスト