AdvWeb: Controllable Black-box Attacks on VLM-powered Web Agents
- URL: http://arxiv.org/abs/2410.17401v2
- Date: Tue, 29 Oct 2024 23:52:46 GMT
- Title: AdvWeb: Controllable Black-box Attacks on VLM-powered Web Agents
- Authors: Chejian Xu, Mintong Kang, Jiawei Zhang, Zeyi Liao, Lingbo Mo, Mengqi Yuan, Huan Sun, Bo Li,
- Abstract summary: AdvWeb is a novel black-box attack framework designed against web agents.
We train and optimize the adversarial prompter model using DPO.
Unlike prior approaches, our adversarial string injection maintains stealth and control.
- Score: 22.682464365220916
- License:
- Abstract: Vision Language Models (VLMs) have revolutionized the creation of generalist web agents, empowering them to autonomously complete diverse tasks on real-world websites, thereby boosting human efficiency and productivity. However, despite their remarkable capabilities, the safety and security of these agents against malicious attacks remain critically underexplored, raising significant concerns about their safe deployment. To uncover and exploit such vulnerabilities in web agents, we provide AdvWeb, a novel black-box attack framework designed against web agents. AdvWeb trains an adversarial prompter model that generates and injects adversarial prompts into web pages, misleading web agents into executing targeted adversarial actions such as inappropriate stock purchases or incorrect bank transactions, actions that could lead to severe real-world consequences. With only black-box access to the web agent, we train and optimize the adversarial prompter model using DPO, leveraging both successful and failed attack strings against the target agent. Unlike prior approaches, our adversarial string injection maintains stealth and control: (1) the appearance of the website remains unchanged before and after the attack, making it nearly impossible for users to detect tampering, and (2) attackers can modify specific substrings within the generated adversarial string to seamlessly change the attack objective (e.g., purchasing stocks from a different company), enhancing attack flexibility and efficiency. We conduct extensive evaluations, demonstrating that AdvWeb achieves high success rates in attacking SOTA GPT-4V-based VLM agent across various web tasks. Our findings expose critical vulnerabilities in current LLM/VLM-based agents, emphasizing the urgent need for developing more reliable web agents and effective defenses. Our code and data are available at https://ai-secure.github.io/AdvWeb/ .
Related papers
- Attacking Vision-Language Computer Agents via Pop-ups [61.744008541021124]
We show that VLM agents can be easily attacked by a set of carefully designed adversarial pop-ups.
This distraction leads agents to click these pop-ups instead of performing the tasks as usual.
arXiv Detail & Related papers (2024-11-04T18:56:42Z) - The Best Defense is a Good Offense: Countering LLM-Powered Cyberattacks [2.6528263069045126]
Large language models (LLMs) could soon become integral to autonomous cyber agents.
We introduce novel defense strategies that exploit the inherent vulnerabilities of attacking LLMs.
Our results show defense success rates of up to 90%, demonstrating the effectiveness of turning LLM vulnerabilities into defensive strategies.
arXiv Detail & Related papers (2024-10-20T14:07:24Z) - EIA: Environmental Injection Attack on Generalist Web Agents for Privacy Leakage [40.82238259404402]
We conduct the first study on the privacy risks of generalist web agents in adversarial environments.
First, we present a realistic threat model for attacks on the website, where we consider two adversarial targets: stealing users' specific PII or the entire user request.
We collect 177 action steps that involve diverse PII categories on realistic websites from the Mind2Web, and conduct experiments using one of the most capable generalist web agent frameworks to date.
arXiv Detail & Related papers (2024-09-17T15:49:44Z) - Mitigating Label Flipping Attacks in Malicious URL Detectors Using
Ensemble Trees [16.16333915007336]
Malicious URLs provide adversarial opportunities across various industries, including transportation, healthcare, energy, and banking.
backdoor attacks involve manipulating a small percentage of training data labels, such as Label Flipping (LF), which changes benign labels to malicious ones and vice versa.
We propose an innovative alarm system that detects the presence of poisoned labels and a defense mechanism designed to uncover the original class labels.
arXiv Detail & Related papers (2024-03-05T14:21:57Z) - WIPI: A New Web Threat for LLM-Driven Web Agents [28.651763099760664]
We introduce a novel threat, WIPI, that indirectly controls Web Agent to execute malicious instructions embedded in publicly accessible webpages.
To launch a successful WIPI works in a black-box environment.
Our methodology achieves an average attack success rate (ASR) exceeding 90% even in pure black-box scenarios.
arXiv Detail & Related papers (2024-02-26T19:01:54Z) - SENet: Visual Detection of Online Social Engineering Attack Campaigns [3.858859576352153]
Social engineering (SE) aims at deceiving users into performing actions that may compromise their security and privacy.
SEShield is a framework for in-browser detection of social engineering attacks.
arXiv Detail & Related papers (2024-01-10T22:25:44Z) - Mutual-modality Adversarial Attack with Semantic Perturbation [81.66172089175346]
We propose a novel approach that generates adversarial attacks in a mutual-modality optimization scheme.
Our approach outperforms state-of-the-art attack methods and can be readily deployed as a plug-and-play solution.
arXiv Detail & Related papers (2023-12-20T05:06:01Z) - BadCLIP: Dual-Embedding Guided Backdoor Attack on Multimodal Contrastive
Learning [85.2564206440109]
This paper reveals the threats in this practical scenario that backdoor attacks can remain effective even after defenses.
We introduce the emphtoolns attack, which is resistant to backdoor detection and model fine-tuning defenses.
arXiv Detail & Related papers (2023-11-20T02:21:49Z) - Baseline Defenses for Adversarial Attacks Against Aligned Language
Models [109.75753454188705]
Recent work shows that text moderations can produce jailbreaking prompts that bypass defenses.
We look at three types of defenses: detection (perplexity based), input preprocessing (paraphrase and retokenization), and adversarial training.
We find that the weakness of existing discretes for text, combined with the relatively high costs of optimization, makes standard adaptive attacks more challenging for LLMs.
arXiv Detail & Related papers (2023-09-01T17:59:44Z) - On the Effectiveness of Adversarial Training against Backdoor Attacks [111.8963365326168]
A backdoored model always predicts a target class in the presence of a predefined trigger pattern.
In general, adversarial training is believed to defend against backdoor attacks.
We propose a hybrid strategy which provides satisfactory robustness across different backdoor attacks.
arXiv Detail & Related papers (2022-02-22T02:24:46Z) - Fixed Points in Cyber Space: Rethinking Optimal Evasion Attacks in the
Age of AI-NIDS [70.60975663021952]
We study blackbox adversarial attacks on network classifiers.
We argue that attacker-defender fixed points are themselves general-sum games with complex phase transitions.
We show that a continual learning approach is required to study attacker-defender dynamics.
arXiv Detail & Related papers (2021-11-23T23:42:16Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.