Related papers: DECEPTICON: How Dark Patterns Manipulate Web Agents

DECEPTICON: How Dark Patterns Manipulate Web Agents

URL: http://arxiv.org/abs/2512.22894v1
Date: Sun, 28 Dec 2025 11:55:20 GMT
Title: DECEPTICON: How Dark Patterns Manipulate Web Agents
Authors: Phil Cuvin, Hao Zhu, Diyi Yang,
Abstract summary: We show that dark patterns are highly effective in steering agent trajectories.<n>We introduce DECEPTICON, an environment for testing individual dark patterns in isolation.<n>We find dark patterns successfully steer agent trajectories towards malicious outcomes in over 70% of tested generated and real-world tasks.
Score: 50.92538792133007
License: http://creativecommons.org/licenses/by/4.0/
Abstract: Deceptive UI designs, widely instantiated across the web and commonly known as dark patterns, manipulate users into performing actions misaligned with their goals. In this paper, we show that dark patterns are highly effective in steering agent trajectories, posing a significant risk to agent robustness. To quantify this risk, we introduce DECEPTICON, an environment for testing individual dark patterns in isolation. DECEPTICON includes 700 web navigation tasks with dark patterns -- 600 generated tasks and 100 real-world tasks, designed to measure instruction-following success and dark pattern effectiveness. Across state-of-the-art agents, we find dark patterns successfully steer agent trajectories towards malicious outcomes in over 70% of tested generated and real-world tasks -- compared to a human average of 31%. Moreover, we find that dark pattern effectiveness correlates positively with model size and test-time reasoning, making larger, more capable models more susceptible. Leading countermeasures against adversarial attacks, including in-context prompting and guardrail models, fail to consistently reduce the success rate of dark pattern interventions. Our findings reveal dark patterns as a latent and unmitigated risk to web agents, highlighting the urgent need for robust defenses against manipulative designs.

Related papers

Dynamic Mask-Based Backdoor Attack Against Vision AI Models: A Case Study on Mushroom Detection [0.0]
This paper presents a novel dynamic mask-based backdoor attack method, specifically designed for object detection models.<n>We exploit a dataset poisoning technique to embed a malicious trigger, rendering any models trained on this compromised dataset vulnerable to our backdoor attack.<n>Our approach surpasses traditional methods for backdoor injection, which are based on static and consistent patterns.
arXiv Detail & Related papers (2026-01-26T12:25:16Z)
Investigating the Impact of Dark Patterns on LLM-Based Web Agents [16.297159088186888]
We present the first study that investigates the impact of dark patterns on the decision-making process of LLM-based generalist web agents.<n>We introduce LiteAgent, a lightweight framework that automatically prompts agents to execute tasks.<n>We also present TrickyArena, a controlled environment comprising web applications from domains such as e-commerce, streaming services, and news platforms.
arXiv Detail & Related papers (2025-10-20T21:26:26Z)
Dark Patterns Meet GUI Agents: LLM Agent Susceptibility to Manipulative Interfaces and the Role of Human Oversight [51.53020962098759]
This study examines how agents, human participants, and human-AI teams respond to 16 types of dark patterns across diverse scenarios.<n>Phase 1 highlights that agents often fail to recognize dark patterns, and even when aware, prioritize task completion over protective action.<n>Phase 2 revealed divergent failure modes: humans succumb due to cognitive shortcuts and habitual compliance, while agents falter from procedural blind spots.
arXiv Detail & Related papers (2025-09-12T22:26:31Z)
A Knowledge-guided Adversarial Defense for Resisting Malicious Visual Manipulation [93.28532038721816]
Malicious applications of visual manipulation have raised serious threats to the security and reputation of users in many fields.<n>We propose a knowledge-guided adversarial defense (KGAD) to actively force malicious manipulation models to output semantically confusing samples.
arXiv Detail & Related papers (2025-04-11T10:18:13Z)
BadCLIP: Dual-Embedding Guided Backdoor Attack on Multimodal Contrastive Learning [85.2564206440109]
This paper reveals the threats in this practical scenario that backdoor attacks can remain effective even after defenses. We introduce the emphtoolns attack, which is resistant to backdoor detection and model fine-tuning defenses.
arXiv Detail & Related papers (2023-11-20T02:21:49Z)
Getting Trapped in Amazon's "Iliad Flow": A Foundation for the Temporal Analysis of Dark Patterns [17.59481743387609]
We present a case study of Amazon Prime's "Iliad Flow" to illustrate the interplay of dark patterns across a user journey.<n>We use this case study to lay the groundwork for a methodology of Temporal Analysis of Dark Patterns (TADP)
arXiv Detail & Related papers (2023-09-18T10:12:52Z)
AidUI: Toward Automated Recognition of Dark Patterns in User Interfaces [6.922187804798161]
UI dark patterns can lead end-users toward (unknowingly) taking actions that they may not have intended. We introduce AidUI, a novel approach that uses computer vision and natural language processing techniques to recognize ten unique UI dark patterns. AidUI achieves an overall precision of 0.66, recall of 0.67, F1-score of 0.65 in detecting dark pattern instances, and is able to localize detected patterns with an IoU score of 0.84.
arXiv Detail & Related papers (2023-03-12T23:46:04Z)
Shadows can be Dangerous: Stealthy and Effective Physical-world Adversarial Attack by Natural Phenomenon [79.33449311057088]
We study a new type of optical adversarial examples, in which the perturbations are generated by a very common natural phenomenon, shadow. We extensively evaluate the effectiveness of this new attack on both simulated and real-world environments.
arXiv Detail & Related papers (2022-03-08T02:40:18Z)
Detection Defense Against Adversarial Attacks with Saliency Map [7.736844355705379]
It is well established that neural networks are vulnerable to adversarial examples, which are almost imperceptible on human vision. Existing defenses are trend to harden the robustness of models against adversarial attacks. We propose a novel method combined with additional noises and utilize the inconsistency strategy to detect adversarial examples.
arXiv Detail & Related papers (2020-09-06T13:57:17Z)
Orthogonal Deep Models As Defense Against Black-Box Attacks [71.23669614195195]
We study the inherent weakness of deep models in black-box settings where the attacker may develop the attack using a model similar to the targeted model. We introduce a novel gradient regularization scheme that encourages the internal representation of a deep model to be orthogonal to another. We verify the effectiveness of our technique on a variety of large-scale models.
arXiv Detail & Related papers (2020-06-26T08:29:05Z)

This list is automatically generated from the titles and abstracts of the papers in this site.