EIA: Environmental Injection Attack on Generalist Web Agents for Privacy Leakage
- URL: http://arxiv.org/abs/2409.11295v3
- Date: Fri, 4 Oct 2024 02:08:57 GMT
- Title: EIA: Environmental Injection Attack on Generalist Web Agents for Privacy Leakage
- Authors: Zeyi Liao, Lingbo Mo, Chejian Xu, Mintong Kang, Jiawei Zhang, Chaowei Xiao, Yuan Tian, Bo Li, Huan Sun,
- Abstract summary: We conduct the first study on the privacy risks of generalist web agents in adversarial environments.
First, we present a realistic threat model for attacks on the website, where we consider two adversarial targets: stealing users' specific PII or the entire user request.
We collect 177 action steps that involve diverse PII categories on realistic websites from the Mind2Web, and conduct experiments using one of the most capable generalist web agent frameworks to date.
- Score: 40.82238259404402
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Generalist web agents have demonstrated remarkable potential in autonomously completing a wide range of tasks on real websites, significantly boosting human productivity. However, web tasks, such as booking flights, usually involve users' PII, which may be exposed to potential privacy risks if web agents accidentally interact with compromised websites, a scenario that remains largely unexplored in the literature. In this work, we narrow this gap by conducting the first study on the privacy risks of generalist web agents in adversarial environments. First, we present a realistic threat model for attacks on the website, where we consider two adversarial targets: stealing users' specific PII or the entire user request. Then, we propose a novel attack method, termed Environmental Injection Attack (EIA). EIA injects malicious content designed to adapt well to environments where the agents operate and our work instantiates EIA specifically for privacy scenarios in web environments. We collect 177 action steps that involve diverse PII categories on realistic websites from the Mind2Web, and conduct experiments using one of the most capable generalist web agent frameworks to date. The results demonstrate that EIA achieves up to 70% ASR in stealing specific PII and 16% ASR for full user request. Additionally, by accessing the stealthiness and experimenting with a defensive system prompt, we indicate that EIA is hard to detect and mitigate. Notably, attacks that are not well adapted for a webpage can be detected via human inspection, leading to our discussion about the trade-off between security and autonomy. However, extra attackers' efforts can make EIA seamlessly adapted, rendering such supervision ineffective. Thus, we further discuss the defenses at the pre- and post-deployment stages of the websites without relying on human supervision and call for more advanced defense strategies.
Related papers
- Proposer-Agent-Evaluator(PAE): Autonomous Skill Discovery For Foundation Model Internet Agents [64.75036903373712]
Proposer-Agent-Evaluator is a learning system that enables foundation model agents to autonomously discover and practice skills in the wild.
At the heart of PAE is a context-aware task proposer that autonomously proposes tasks for the agent to practice with context information.
The success evaluation serves as the reward signal for the agent to refine its policies through RL.
arXiv Detail & Related papers (2024-12-17T18:59:50Z) - Hijacking Vision-and-Language Navigation Agents with Adversarial Environmental Attacks [12.96291706848273]
Vision-and-Language Navigation (VLN) task as a representative setting.
Whitebox adversarial attack developed to induce desired behaviors in pretrained VLN agents.
We find our attacks can induce early-termination behaviors or divert an agent along an attacker-defined multi-step trajectory.
arXiv Detail & Related papers (2024-12-03T19:54:32Z) - Auto-Intent: Automated Intent Discovery and Self-Exploration for Large Language Model Web Agents [68.22496852535937]
We introduce Auto-Intent, a method to adapt a pre-trained large language model (LLM) as an agent for a target domain without direct fine-tuning.
Our approach first discovers the underlying intents from target domain demonstrations unsupervisedly.
We train our intent predictor to predict the next intent given the agent's past observations and actions.
arXiv Detail & Related papers (2024-10-29T21:37:04Z) - AdvWeb: Controllable Black-box Attacks on VLM-powered Web Agents [22.682464365220916]
AdvWeb is a novel black-box attack framework designed against web agents.
We train and optimize the adversarial prompter model using DPO.
Unlike prior approaches, our adversarial string injection maintains stealth and control.
arXiv Detail & Related papers (2024-10-22T20:18:26Z) - Dissecting Adversarial Robustness of Multimodal LM Agents [70.2077308846307]
We manually create 200 targeted adversarial tasks and evaluation scripts in a realistic threat model on top of VisualWebArena.
We find that we can successfully break latest agents that use black-box frontier LMs, including those that perform reflection and tree search.
We also use ARE to rigorously evaluate how the robustness changes as new components are added.
arXiv Detail & Related papers (2024-06-18T17:32:48Z) - Offensive AI: Enhancing Directory Brute-forcing Attack with the Use of Language Models [16.89878267176532]
Offensive AI is a paradigm that integrates AI-based technologies in cyber attacks.
In this work, we explore whether AI can enhance the directory enumeration process and propose a novel Language Model-based framework.
Our experiments -- conducted in a testbed consisting of 1 million URLs from different web application domains -- demonstrate the superiority of the LM-based attack, with an average performance increase of 969%.
arXiv Detail & Related papers (2024-04-22T12:40:38Z) - Towards Practical Deployment-Stage Backdoor Attack on Deep Neural
Networks [5.231607386266116]
We study the realistic threat of deployment-stage backdoor attacks on deep learning models.
We propose the first gray-box and physically realizable weights attack algorithm for backdoor injection.
Our results suggest the effectiveness and practicality of the proposed attack algorithm.
arXiv Detail & Related papers (2021-11-25T08:25:27Z) - Fixed Points in Cyber Space: Rethinking Optimal Evasion Attacks in the
Age of AI-NIDS [70.60975663021952]
We study blackbox adversarial attacks on network classifiers.
We argue that attacker-defender fixed points are themselves general-sum games with complex phase transitions.
We show that a continual learning approach is required to study attacker-defender dynamics.
arXiv Detail & Related papers (2021-11-23T23:42:16Z) - Evaluating Attacker Risk Behavior in an Internet of Things Ecosystem [2.958532752589616]
In cybersecurity, attackers range from brash, unsophisticated script kiddies to stealthy, patient advanced persistent threats.
This work explores how an attacker's risk seeking or risk averse behavior affects their operations against detection-optimizing defenders.
arXiv Detail & Related papers (2021-09-23T18:53:41Z) - Measurement-driven Security Analysis of Imperceptible Impersonation
Attacks [54.727945432381716]
We study the exploitability of Deep Neural Network-based Face Recognition systems.
We show that factors such as skin color, gender, and age, impact the ability to carry out an attack on a specific target victim.
We also study the feasibility of constructing universal attacks that are robust to different poses or views of the attacker's face.
arXiv Detail & Related papers (2020-08-26T19:27:27Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.