RedTeamCUA: Realistic Adversarial Testing of Computer-Use Agents in Hybrid Web-OS Environments
- URL: http://arxiv.org/abs/2505.21936v2
- Date: Sun, 01 Jun 2025 00:04:07 GMT
- Title: RedTeamCUA: Realistic Adversarial Testing of Computer-Use Agents in Hybrid Web-OS Environments
- Authors: Zeyi Liao, Jaylen Jones, Linxi Jiang, Eric Fosler-Lussier, Yu Su, Zhiqiang Lin, Huan Sun,
- Abstract summary: Computer-use agents (CUAs) promise to automate complex tasks across operating systems (OS) and the web, but remain vulnerable to indirect prompt injection.<n>We propose RedTeamCUA, an adversarial testing framework featuring a novel hybrid sandbox that integrates a VM-based OS environment with Docker-based web platforms.<n>RedTeamCUA provides an essential framework for advancing realistic, controlled, and systematic analysis of CUA vulnerabilities.
- Score: 40.354694210052095
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Computer-use agents (CUAs) promise to automate complex tasks across operating systems (OS) and the web, but remain vulnerable to indirect prompt injection. Current evaluations of this threat either lack support realistic but controlled environments or ignore hybrid web-OS attack scenarios involving both interfaces. To address this, we propose RedTeamCUA, an adversarial testing framework featuring a novel hybrid sandbox that integrates a VM-based OS environment with Docker-based web platforms. Our sandbox supports key features tailored for red teaming, such as flexible adversarial scenario configuration, and a setting that decouples adversarial evaluation from navigational limitations of CUAs by initializing tests directly at the point of an adversarial injection. Using RedTeamCUA, we develop RTC-Bench, a comprehensive benchmark with 864 examples that investigate realistic, hybrid web-OS attack scenarios and fundamental security vulnerabilities. Benchmarking current frontier CUAs identifies significant vulnerabilities: Claude 3.7 Sonnet | CUA demonstrates an ASR of 42.9%, while Operator, the most secure CUA evaluated, still exhibits an ASR of 7.6%. Notably, CUAs often attempt to execute adversarial tasks with an Attempt Rate as high as 92.5%, although failing to complete them due to capability limitations. Nevertheless, we observe concerning ASRs of up to 50% in realistic end-to-end settings, with the recently released frontier Claude 4 Opus | CUA showing an alarming ASR of 48%, demonstrating that indirect prompt injection presents tangible risks for even advanced CUAs despite their capabilities and safeguards. Overall, RedTeamCUA provides an essential framework for advancing realistic, controlled, and systematic analysis of CUA vulnerabilities, highlighting the urgent need for robust defenses to indirect prompt injection prior to real-world deployment.
Related papers
- OpenAgentSafety: A Comprehensive Framework for Evaluating Real-World AI Agent Safety [58.201189860217724]
We introduce OpenAgentSafety, a comprehensive framework for evaluating agent behavior across eight critical risk categories.<n>Unlike prior work, our framework evaluates agents that interact with real tools, including web browsers, code execution environments, file systems, bash shells, and messaging platforms.<n>It combines rule-based analysis with LLM-as-judge assessments to detect both overt and subtle unsafe behaviors.
arXiv Detail & Related papers (2025-07-08T16:18:54Z) - A Systematization of Security Vulnerabilities in Computer Use Agents [1.3560089220432787]
We conduct a systematic threat analysis and testing of real-world CUAs under adversarial conditions.<n>We identify seven classes of risks unique to the CUA paradigm, and analyze three concrete exploit scenarios in depth.<n>These case studies reveal deeper architectural flaws across current CUA implementations.
arXiv Detail & Related papers (2025-07-07T19:50:21Z) - CyberGym: Evaluating AI Agents' Cybersecurity Capabilities with Real-World Vulnerabilities at Scale [46.76144797837242]
Large language model (LLM) agents are becoming increasingly skilled at handling cybersecurity tasks autonomously.<n>Existing benchmarks fall short, often failing to capture real-world scenarios or being limited in scope.<n>We introduce CyberGym, a large-scale and high-quality cybersecurity evaluation framework featuring 1,507 real-world vulnerabilities.
arXiv Detail & Related papers (2025-06-03T07:35:14Z) - VPI-Bench: Visual Prompt Injection Attacks for Computer-Use Agents [74.6761188527948]
Computer-Use Agents (CUAs) with full system access pose significant security and privacy risks.<n>We investigate Visual Prompt Injection (VPI) attacks, where malicious instructions are visually embedded within rendered user interfaces.<n>Our empirical study shows that current CUAs and BUAs can be deceived at rates of up to 51% and 100%, respectively, on certain platforms.
arXiv Detail & Related papers (2025-06-03T05:21:50Z) - CANTXSec: A Deterministic Intrusion Detection and Prevention System for CAN Bus Monitoring ECU Activations [53.036288487863786]
We propose CANTXSec, the first deterministic Intrusion Detection and Prevention system based on physical ECU activations.<n>It detects and prevents classical attacks in the CAN bus, while detecting advanced attacks that have been less investigated in the literature.<n>We prove the effectiveness of our solution on a physical testbed, where we achieve 100% detection accuracy in both classes of attacks while preventing 100% of FIAs.
arXiv Detail & Related papers (2025-05-14T13:37:07Z) - RedTeamLLM: an Agentic AI framework for offensive security [0.0]
We propose and evaluate RedTeamLLM, an integrated architecture with a comprehensive security model for automatization of pentest tasks.<n>RedTeamLLM follows three key steps: summarizing, reasoning and act, which embed its operational capacity.<n> Evaluation is performed through the automated resolution of a range of entry-level, but not trivial, CTF challenges.
arXiv Detail & Related papers (2025-05-11T09:19:10Z) - AGENTFUZZER: Generic Black-Box Fuzzing for Indirect Prompt Injection against LLM Agents [54.29555239363013]
We propose a generic black-box fuzzing framework, AgentXploit, to automatically discover and exploit indirect prompt injection vulnerabilities.<n>We evaluate AgentXploit on two public benchmarks, AgentDojo and VWA-adv, where it achieves 71% and 70% success rates against agents based on o3-mini and GPT-4o.<n>We apply our attacks in real-world environments, successfully misleading agents to navigate to arbitrary URLs, including malicious sites.
arXiv Detail & Related papers (2025-05-09T07:40:17Z) - The Cost of Performance: Breaking ThreadX with Kernel Object Masquerading Attacks [16.54210795506388]
We show that popular real-time operating systems (RTOSs) lack essential security protections.<n>We identify a performance optimization practice in ThreadX that introduces security vulnerabilities, allowing for the circumvention of parameter sanitization processes.<n>We introduce an automated approach involving under-constrained symbolic execution to identify the Kernel Object Masquerading (KOM) Attack.
arXiv Detail & Related papers (2025-04-28T05:01:35Z) - AutoRedTeamer: Autonomous Red Teaming with Lifelong Attack Integration [40.350632196772466]
This paper introduces AutoRedTeamer, a novel framework for fully automated, end-to-end red teaming against large language models (LLMs)<n>AutoRedTeamer combines a multi-agent architecture with a memory-guided attack selection mechanism to enable continuous discovery and integration of new attack vectors.<n>We demonstrate AutoRedTeamer's effectiveness across diverse evaluation settings, achieving 20% higher attack success rates on HarmBench against Llama-3.1-70B.
arXiv Detail & Related papers (2025-03-20T00:13:04Z) - Adaptive Attacks Break Defenses Against Indirect Prompt Injection Attacks on LLM Agents [3.5248694676821484]
We evaluate eight different defenses and bypass all of them using adaptive attacks, consistently achieving an attack success rate of over 50%.<n>Our research underscores the need for adaptive attack evaluation when designing defenses to ensure robustness and reliability.
arXiv Detail & Related papers (2025-02-27T04:04:50Z) - AdvAgent: Controllable Blackbox Red-teaming on Web Agents [22.682464365220916]
AdvAgent is a black-box red-teaming framework for attacking web agents.<n>It employs a reinforcement learning-based pipeline to train an adversarial prompter model.<n>With careful attack design, these prompts effectively exploit agent weaknesses while maintaining stealthiness and controllability.
arXiv Detail & Related papers (2024-10-22T20:18:26Z) - Dissecting Adversarial Robustness of Multimodal LM Agents [70.2077308846307]
We manually create 200 targeted adversarial tasks and evaluation scripts in a realistic threat model on top of VisualWebArena.<n>We find that we can successfully break latest agents that use black-box frontier LMs, including those that perform reflection and tree search.<n>We also use ARE to rigorously evaluate how the robustness changes as new components are added.
arXiv Detail & Related papers (2024-06-18T17:32:48Z) - Adversarial EXEmples: A Survey and Experimental Evaluation of Practical
Attacks on Machine Learning for Windows Malware Detection [67.53296659361598]
adversarial EXEmples can bypass machine learning-based detection by perturbing relatively few input bytes.
We develop a unifying framework that does not only encompass and generalize previous attacks against machine-learning models, but also includes three novel attacks.
These attacks, named Full DOS, Extend and Shift, inject the adversarial payload by respectively manipulating the DOS header, extending it, and shifting the content of the first section.
arXiv Detail & Related papers (2020-08-17T07:16:57Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.