Dissecting Adversarial Robustness of Multimodal LM Agents
- URL: http://arxiv.org/abs/2406.12814v3
- Date: Tue, 04 Feb 2025 20:02:17 GMT
- Title: Dissecting Adversarial Robustness of Multimodal LM Agents
- Authors: Chen Henry Wu, Rishi Shah, Jing Yu Koh, Ruslan Salakhutdinov, Daniel Fried, Aditi Raghunathan,
- Abstract summary: We manually create 200 targeted adversarial tasks and evaluation scripts in a realistic threat model on top of VisualWebArena.
We find that we can successfully break latest agents that use black-box frontier LMs, including those that perform reflection and tree search.
We also use ARE to rigorously evaluate how the robustness changes as new components are added.
- Score: 70.2077308846307
- License:
- Abstract: As language models (LMs) are used to build autonomous agents in real environments, ensuring their adversarial robustness becomes a critical challenge. Unlike chatbots, agents are compound systems with multiple components taking actions, which existing LMs safety evaluations do not adequately address. To bridge this gap, we manually create 200 targeted adversarial tasks and evaluation scripts in a realistic threat model on top of VisualWebArena, a real environment for web agents. To systematically examine the robustness of agents, we propose the Agent Robustness Evaluation (ARE) framework. ARE views the agent as a graph showing the flow of intermediate outputs between components and decomposes robustness as the flow of adversarial information on the graph. We find that we can successfully break latest agents that use black-box frontier LMs, including those that perform reflection and tree search. With imperceptible perturbations to a single image (less than 5% of total web page pixels), an attacker can hijack these agents to execute targeted adversarial goals with success rates up to 67%. We also use ARE to rigorously evaluate how the robustness changes as new components are added. We find that inference-time compute that typically improves benign performance can open up new vulnerabilities and harm robustness. An attacker can compromise the evaluator used by the reflexion agent and the value function of the tree search agent, which increases the attack success relatively by 15% and 20%. Our data and code for attacks, defenses, and evaluation are at https://github.com/ChenWu98/agent-attack
Related papers
- Imprompter: Tricking LLM Agents into Improper Tool Use [35.255462653237885]
Large Language Model (LLM) Agents are an emerging computing paradigm that blends generative machine learning with tools such as code interpreters, web browsing, email, and more generally, external resources.
We contribute to the security foundations of agent-based systems and surface a new class of automatically computed obfuscated adversarial prompt attacks.
arXiv Detail & Related papers (2024-10-19T01:00:57Z) - Agent-as-a-Judge: Evaluate Agents with Agents [61.33974108405561]
We introduce the Agent-as-a-Judge framework, wherein agentic systems are used to evaluate agentic systems.
This is an organic extension of the LLM-as-a-Judge framework, incorporating agentic features that enable intermediate feedback for the entire task-solving process.
We present DevAI, a new benchmark of 55 realistic automated AI development tasks.
arXiv Detail & Related papers (2024-10-14T17:57:02Z) - Compromising Embodied Agents with Contextual Backdoor Attacks [69.71630408822767]
Large language models (LLMs) have transformed the development of embodied intelligence.
This paper uncovers a significant backdoor security threat within this process.
By poisoning just a few contextual demonstrations, attackers can covertly compromise the contextual environment of a black-box LLM.
arXiv Detail & Related papers (2024-08-06T01:20:12Z) - On the Resilience of LLM-Based Multi-Agent Collaboration with Faulty Agents [58.79302663733703]
Large language model-based multi-agent systems have shown great abilities across various tasks due to the collaboration of expert agents.
However, the impact of clumsy or even malicious agents, on the overall performance of the system remains underexplored.
This paper investigates what is the resilience of various system structures under faulty agents.
arXiv Detail & Related papers (2024-08-02T03:25:20Z) - AgentPoison: Red-teaming LLM Agents via Poisoning Memory or Knowledge Bases [73.04652687616286]
We propose AgentPoison, the first backdoor attack targeting generic and RAG-based LLM agents by poisoning their long-term memory or RAG knowledge base.
Unlike conventional backdoor attacks, AgentPoison requires no additional model training or fine-tuning.
On each agent, AgentPoison achieves an average attack success rate higher than 80% with minimal impact on benign performance.
arXiv Detail & Related papers (2024-07-17T17:59:47Z) - AgentDojo: A Dynamic Environment to Evaluate Prompt Injection Attacks and Defenses for LLM Agents [27.701301913159067]
We introduce AgentDojo, an evaluation framework for agents that execute tools over untrusted data.
AgentDojo is not a static test suite, but rather an environment for designing and evaluating new agent tasks, defenses, and adaptive attacks.
We populate AgentDojo with 97 realistic tasks, 629 security test cases, and various attack and defense paradigms from the literature.
arXiv Detail & Related papers (2024-06-19T08:55:56Z) - Evil Geniuses: Delving into the Safety of LLM-based Agents [35.49857256840015]
Large language models (LLMs) have revitalized in large language models (LLMs)
This paper delves into the safety of LLM-based agents from three perspectives: agent quantity, role definition, and attack level.
arXiv Detail & Related papers (2023-11-20T15:50:09Z) - Malicious Agent Detection for Robust Multi-Agent Collaborative Perception [52.261231738242266]
Multi-agent collaborative (MAC) perception is more vulnerable to adversarial attacks than single-agent perception.
We propose Malicious Agent Detection (MADE), a reactive defense specific to MAC perception.
We conduct comprehensive evaluations on a benchmark 3D dataset V2X-sim and a real-road dataset DAIR-V2X.
arXiv Detail & Related papers (2023-10-18T11:36:42Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.