EIA: Environmental Injection Attack on Generalist Web Agents for Privacy Leakage
- URL: http://arxiv.org/abs/2409.11295v3
- Date: Fri, 4 Oct 2024 02:08:57 GMT
- Title: EIA: Environmental Injection Attack on Generalist Web Agents for Privacy Leakage
- Authors: Zeyi Liao, Lingbo Mo, Chejian Xu, Mintong Kang, Jiawei Zhang, Chaowei Xiao, Yuan Tian, Bo Li, Huan Sun,
- Abstract summary: We conduct the first study on the privacy risks of generalist web agents in adversarial environments.
First, we present a realistic threat model for attacks on the website, where we consider two adversarial targets: stealing users' specific PII or the entire user request.
We collect 177 action steps that involve diverse PII categories on realistic websites from the Mind2Web, and conduct experiments using one of the most capable generalist web agent frameworks to date.
- Score: 40.82238259404402
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Generalist web agents have demonstrated remarkable potential in autonomously completing a wide range of tasks on real websites, significantly boosting human productivity. However, web tasks, such as booking flights, usually involve users' PII, which may be exposed to potential privacy risks if web agents accidentally interact with compromised websites, a scenario that remains largely unexplored in the literature. In this work, we narrow this gap by conducting the first study on the privacy risks of generalist web agents in adversarial environments. First, we present a realistic threat model for attacks on the website, where we consider two adversarial targets: stealing users' specific PII or the entire user request. Then, we propose a novel attack method, termed Environmental Injection Attack (EIA). EIA injects malicious content designed to adapt well to environments where the agents operate and our work instantiates EIA specifically for privacy scenarios in web environments. We collect 177 action steps that involve diverse PII categories on realistic websites from the Mind2Web, and conduct experiments using one of the most capable generalist web agent frameworks to date. The results demonstrate that EIA achieves up to 70% ASR in stealing specific PII and 16% ASR for full user request. Additionally, by accessing the stealthiness and experimenting with a defensive system prompt, we indicate that EIA is hard to detect and mitigate. Notably, attacks that are not well adapted for a webpage can be detected via human inspection, leading to our discussion about the trade-off between security and autonomy. However, extra attackers' efforts can make EIA seamlessly adapted, rendering such supervision ineffective. Thus, we further discuss the defenses at the pre- and post-deployment stages of the websites without relying on human supervision and call for more advanced defense strategies.
Related papers
- Auto-Intent: Automated Intent Discovery and Self-Exploration for Large Language Model Web Agents [68.22496852535937]
We introduce Auto-Intent, a method to adapt a pre-trained large language model (LLM) as an agent for a target domain without direct fine-tuning.
Our approach first discovers the underlying intents from target domain demonstrations unsupervisedly.
We train our intent predictor to predict the next intent given the agent's past observations and actions.
arXiv Detail & Related papers (2024-10-29T21:37:04Z) - Countering Autonomous Cyber Threats [40.00865970939829]
Foundation Models present dual-use concerns broadly and within the cyber domain specifically.
Recent research has shown the potential for these advanced models to inform or independently execute offensive cyberspace operations.
This work evaluates several state-of-the-art FMs on their ability to compromise machines in an isolated network and investigates defensive mechanisms to defeat such AI-powered attacks.
arXiv Detail & Related papers (2024-10-23T22:46:44Z) - AdvWeb: Controllable Black-box Attacks on VLM-powered Web Agents [22.682464365220916]
AdvWeb is a novel black-box attack framework designed against web agents.
We train and optimize the adversarial prompter model using DPO.
Unlike prior approaches, our adversarial string injection maintains stealth and control.
arXiv Detail & Related papers (2024-10-22T20:18:26Z) - Offensive AI: Enhancing Directory Brute-forcing Attack with the Use of Language Models [16.89878267176532]
Offensive AI is a paradigm that integrates AI-based technologies in cyber attacks.
In this work, we explore whether AI can enhance the directory enumeration process and propose a novel Language Model-based framework.
Our experiments -- conducted in a testbed consisting of 1 million URLs from different web application domains -- demonstrate the superiority of the LM-based attack, with an average performance increase of 969%.
arXiv Detail & Related papers (2024-04-22T12:40:38Z) - Illusory Attacks: Information-Theoretic Detectability Matters in Adversarial Attacks [76.35478518372692]
We introduce epsilon-illusory, a novel form of adversarial attack on sequential decision-makers.
Compared to existing attacks, we empirically find epsilon-illusory to be significantly harder to detect with automated methods.
Our findings suggest the need for better anomaly detectors, as well as effective hardware- and system-level defenses.
arXiv Detail & Related papers (2022-07-20T19:49:09Z) - Towards Practical Deployment-Stage Backdoor Attack on Deep Neural
Networks [5.231607386266116]
We study the realistic threat of deployment-stage backdoor attacks on deep learning models.
We propose the first gray-box and physically realizable weights attack algorithm for backdoor injection.
Our results suggest the effectiveness and practicality of the proposed attack algorithm.
arXiv Detail & Related papers (2021-11-25T08:25:27Z) - Fixed Points in Cyber Space: Rethinking Optimal Evasion Attacks in the
Age of AI-NIDS [70.60975663021952]
We study blackbox adversarial attacks on network classifiers.
We argue that attacker-defender fixed points are themselves general-sum games with complex phase transitions.
We show that a continual learning approach is required to study attacker-defender dynamics.
arXiv Detail & Related papers (2021-11-23T23:42:16Z) - Automating Privilege Escalation with Deep Reinforcement Learning [71.87228372303453]
In this work, we exemplify the potential threat of malicious actors using deep reinforcement learning to train automated agents.
We present an agent that uses a state-of-the-art reinforcement learning algorithm to perform local privilege escalation.
Our agent is usable for generating realistic attack sensor data for training and evaluating intrusion detection systems.
arXiv Detail & Related papers (2021-10-04T12:20:46Z) - Evaluating Attacker Risk Behavior in an Internet of Things Ecosystem [2.958532752589616]
In cybersecurity, attackers range from brash, unsophisticated script kiddies to stealthy, patient advanced persistent threats.
This work explores how an attacker's risk seeking or risk averse behavior affects their operations against detection-optimizing defenders.
arXiv Detail & Related papers (2021-09-23T18:53:41Z) - Measurement-driven Security Analysis of Imperceptible Impersonation
Attacks [54.727945432381716]
We study the exploitability of Deep Neural Network-based Face Recognition systems.
We show that factors such as skin color, gender, and age, impact the ability to carry out an attack on a specific target victim.
We also study the feasibility of constructing universal attacks that are robust to different poses or views of the attacker's face.
arXiv Detail & Related papers (2020-08-26T19:27:27Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.