Evaluating the Robustness of Multimodal Agents Against Active Environmental Injection Attacks
- URL: http://arxiv.org/abs/2502.13053v2
- Date: Tue, 22 Apr 2025 02:58:19 GMT
- Title: Evaluating the Robustness of Multimodal Agents Against Active Environmental Injection Attacks
- Authors: Yurun Chen, Xavier Hu, Keting Yin, Juncheng Li, Shengyu Zhang,
- Abstract summary: This study investigates the ability of AI agents to detect "impostors" within their environment.<n>A significant threat-attackers can disguise malicious attacks as environmental elements, injecting active disturbances into the agents' execution processes.<n>We propose AEIA-MN, an attack scheme that exploits interaction vulnerabilities in mobile operating systems to assess the robustness of MLLM-based agents.
- Score: 7.480236701595425
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: As researchers continue to optimize AI agents for more effective task execution within operating systems, they often overlook a critical security concern: the ability of these agents to detect "impostors" within their environment. Through an analysis of the agents' operational context, we identify a significant threat-attackers can disguise malicious attacks as environmental elements, injecting active disturbances into the agents' execution processes to manipulate their decision-making. We define this novel threat as the Active Environment Injection Attack (AEIA). Focusing on the interaction mechanisms of the Android OS, we conduct a risk assessment of AEIA and identify two critical security vulnerabilities: (1) Adversarial content injection in multimodal interaction interfaces, where attackers embed adversarial instructions within environmental elements to mislead agent decision-making; and (2) Reasoning gap vulnerabilities in the agent's task execution process, which increase susceptibility to AEIA attacks during reasoning. To evaluate the impact of these vulnerabilities, we propose AEIA-MN, an attack scheme that exploits interaction vulnerabilities in mobile operating systems to assess the robustness of MLLM-based agents. Experimental results show that even advanced MLLMs are highly vulnerable to this attack, achieving a maximum attack success rate of 93% on the AndroidWorld benchmark by combining two vulnerabilities.
Related papers
- Real AI Agents with Fake Memories: Fatal Context Manipulation Attacks on Web3 Agents [36.49717045080722]
This paper investigates the vulnerabilities of AI agents within blockchain-based financial ecosystems when exposed to adversarial threats in real-world scenarios.
We introduce the concept of context manipulation, a comprehensive attack vector that exploits unprotected context surfaces.
To quantify these vulnerabilities, we design CrAIBench, a Web3 domain-specific benchmark that evaluates the robustness of AI agents against context manipulation attacks.
arXiv Detail & Related papers (2025-03-20T15:44:31Z) - Multi-Agent Security Tax: Trading Off Security and Collaboration Capabilities in Multi-Agent Systems [1.2564343689544843]
We develop simulations of AI agents collaborating on shared objectives to study security risks and trade-offs.
We observe infectious malicious prompts - the multi-hop spreading of malicious instructions.
Our findings illustrate potential trade-off between security and collaborative efficiency in multi-agent systems.
arXiv Detail & Related papers (2025-02-26T14:00:35Z) - Multi-Agent Risks from Advanced AI [90.74347101431474]
Multi-agent systems of advanced AI pose novel and under-explored risks.
We identify three key failure modes based on agents' incentives, as well as seven key risk factors.
We highlight several important instances of each risk, as well as promising directions to help mitigate them.
arXiv Detail & Related papers (2025-02-19T23:03:21Z) - MELON: Indirect Prompt Injection Defense via Masked Re-execution and Tool Comparison [60.30753230776882]
LLM agents are vulnerable to indirect prompt injection (IPI) attacks.<n>We present MELON, a novel IPI defense.<n>We show that MELON outperforms SOTA defenses in both attack prevention and utility preservation.
arXiv Detail & Related papers (2025-02-07T18:57:49Z) - Black-Box Adversarial Attack on Vision Language Models for Autonomous Driving [65.61999354218628]
We take the first step toward designing black-box adversarial attacks specifically targeting vision-language models (VLMs) in autonomous driving systems.
We propose Cascading Adversarial Disruption (CAD), which targets low-level reasoning breakdown by generating and injecting semantics.
We present Risky Scene Induction, which addresses dynamic adaptation by leveraging a surrogate VLM to understand and construct high-level risky scenarios.
arXiv Detail & Related papers (2025-01-23T11:10:02Z) - Hijacking Vision-and-Language Navigation Agents with Adversarial Environmental Attacks [12.96291706848273]
Vision-and-Language Navigation (VLN) task as a representative setting.<n>Whitebox adversarial attack developed to induce desired behaviors in pretrained VLN agents.<n>We find our attacks can induce early-termination behaviors or divert an agent along an attacker-defined multi-step trajectory.
arXiv Detail & Related papers (2024-12-03T19:54:32Z) - Breaking ReAct Agents: Foot-in-the-Door Attack Will Get You In [5.65782619470663]
We examine how ReAct agents can be exploited using a straightforward yet effective method we refer to as the foot-in-the-door attack.
Our experiments show that indirect prompt injection attacks can significantly increase the likelihood of the agent performing subsequent malicious actions.
To mitigate this vulnerability, we propose implementing a simple reflection mechanism that prompts the agent to reassess the safety of its actions during execution.
arXiv Detail & Related papers (2024-10-22T12:24:41Z) - HAICOSYSTEM: An Ecosystem for Sandboxing Safety Risks in Human-AI Interactions [76.42274173122328]
We present HAICOSYSTEM, a framework examining AI agent safety within diverse and complex social interactions.
We run 1840 simulations based on 92 scenarios across seven domains (e.g., healthcare, finance, education)
Our experiments show that state-of-the-art LLMs, both proprietary and open-sourced, exhibit safety risks in over 50% cases.
arXiv Detail & Related papers (2024-09-24T19:47:21Z) - Safeguarding AI Agents: Developing and Analyzing Safety Architectures [0.0]
This paper addresses the need for safety measures in AI systems that collaborate with human teams.
We propose and evaluate three frameworks to enhance safety protocols in AI agent systems.
We conclude that these frameworks can significantly strengthen the safety and security of AI agent systems.
arXiv Detail & Related papers (2024-09-03T10:14:51Z) - EARBench: Towards Evaluating Physical Risk Awareness for Task Planning of Foundation Model-based Embodied AI Agents [53.717918131568936]
Embodied artificial intelligence (EAI) integrates advanced AI models into physical entities for real-world interaction.
Foundation models as the "brain" of EAI agents for high-level task planning have shown promising results.
However, the deployment of these agents in physical environments presents significant safety challenges.
This study introduces EARBench, a novel framework for automated physical risk assessment in EAI scenarios.
arXiv Detail & Related papers (2024-08-08T13:19:37Z) - Compromising Embodied Agents with Contextual Backdoor Attacks [69.71630408822767]
Large language models (LLMs) have transformed the development of embodied intelligence.
This paper uncovers a significant backdoor security threat within this process.
By poisoning just a few contextual demonstrations, attackers can covertly compromise the contextual environment of a black-box LLM.
arXiv Detail & Related papers (2024-08-06T01:20:12Z) - Breaking Agents: Compromising Autonomous LLM Agents Through Malfunction Amplification [35.16099878559559]
Large language models (LLMs) have experienced significant development and are being deployed in real-world applications.
We introduce a new type of attack that causes malfunctions by misleading the agent into executing repetitive or irrelevant actions.
Our experiments reveal that these attacks can induce failure rates exceeding 80% in multiple scenarios.
arXiv Detail & Related papers (2024-07-30T14:35:31Z) - Can We Trust Embodied Agents? Exploring Backdoor Attacks against Embodied LLM-based Decision-Making Systems [27.316115171846953]
Large Language Models (LLMs) have shown significant promise in real-world decision-making tasks for embodied AI.
LLMs are fine-tuned to leverage their inherent common sense and reasoning abilities while being tailored to specific applications.
This fine-tuning process introduces considerable safety and security vulnerabilities, especially in safety-critical cyber-physical systems.
arXiv Detail & Related papers (2024-05-27T17:59:43Z) - Highlighting the Safety Concerns of Deploying LLMs/VLMs in Robotics [54.57914943017522]
We highlight the critical issues of robustness and safety associated with integrating large language models (LLMs) and vision-language models (VLMs) into robotics applications.
arXiv Detail & Related papers (2024-02-15T22:01:45Z) - Benchmarking and Defending Against Indirect Prompt Injection Attacks on Large Language Models [79.0183835295533]
We introduce the first benchmark for indirect prompt injection attacks, named BIPIA, to assess the risk of such vulnerabilities.<n>Our analysis identifies two key factors contributing to their success: LLMs' inability to distinguish between informational context and actionable instructions, and their lack of awareness in avoiding the execution of instructions within external content.<n>We propose two novel defense mechanisms-boundary awareness and explicit reminder-to address these vulnerabilities in both black-box and white-box settings.
arXiv Detail & Related papers (2023-12-21T01:08:39Z) - Raij\=u: Reinforcement Learning-Guided Post-Exploitation for Automating
Security Assessment of Network Systems [0.0]
Raij=u framework is a Reinforcement Learning-driven automation approach.
We implement two RL algorithms to train specialized agents capable of making intelligent actions.
Agents achieve over 84% of successful attacks with under 55 attack steps given.
arXiv Detail & Related papers (2023-09-27T09:36:22Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.