Related papers: "Are You Sure?": An Empirical Study of Human Perception Vulnerability in LLM-Driven Agentic Systems

"Are You Sure?": An Empirical Study of Human Perception Vulnerability in LLM-Driven Agentic Systems

URL: http://arxiv.org/abs/2602.21127v1
Date: Tue, 24 Feb 2026 17:23:11 GMT
Title: "Are You Sure?": An Empirical Study of Human Perception Vulnerability in LLM-Driven Agentic Systems
Authors: Xinfeng Li, Shenyu Dai, Kelong Zheng, Yue Xiao, Gelei Deng, Wei Dong, Xiaofeng Wang,
Abstract summary: We present the first large-scale empirical study with 303 participants to measure human susceptibility to AMD.<n>Our 10 key findings reveal significant vulnerabilities and provide future defense perspectives.<n>With experiential learning based on HAT-Lab, over 90% of users who perceive risks report increased caution against AMD.
Score: 21.769264539684333
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Large language model (LLM) agents are rapidly becoming trusted copilots in high-stakes domains like software development and healthcare. However, this deepening trust introduces a novel attack surface: Agent-Mediated Deception (AMD), where compromised agents are weaponized against their human users. While extensive research focuses on agent-centric threats, human susceptibility to deception by a compromised agent remains unexplored. We present the first large-scale empirical study with 303 participants to measure human susceptibility to AMD. This is based on HAT-Lab (Human-Agent Trust Laboratory), a high-fidelity research platform we develop, featuring nine carefully crafted scenarios spanning everyday and professional domains (e.g., healthcare, software development, human resources). Our 10 key findings reveal significant vulnerabilities and provide future defense perspectives. Specifically, only 8.6% of participants perceive AMD attacks, while domain experts show increased susceptibility in certain scenarios. We identify six cognitive failure modes in users and find that their risk awareness often fails to translate to protective behavior. The defense analysis reveals that effective warnings should interrupt workflows with low verification costs. With experiential learning based on HAT-Lab, over 90% of users who perceive risks report increased caution against AMD. This work provides empirical evidence and a platform for human-centric agent security research.

Related papers

OMNI-LEAK: Orchestrator Multi-Agent Network Induced Data Leakage [59.3826294523924]
We investigate the security vulnerabilities of a popular multi-agent pattern known as the orchestrator setup.<n>We report the susceptibility of frontier models to different categories of attacks, finding that both reasoning and non-reasoning models are vulnerable.
arXiv Detail & Related papers (2026-02-13T21:32:32Z)
Shadows in the Code: Exploring the Risks and Defenses of LLM-based Multi-Agent Software Development Systems [15.276177828252829]
We identify two risky scenarios: Malicious User with Benign Agents (MU-BA) and Benign User with Malicious Agents (BU-MA)<n>We introduce the Implicit Malicious Behavior Injection Attack (IMBIA) demonstrating how multi-agent systems can be manipulated to generate software with concealed malicious capabilities beneath seemingly benign applications.<n>Our findings highlight the urgent need for robust security measures in multi-agent software development systems.
arXiv Detail & Related papers (2025-11-23T14:26:35Z)
SafeSearch: Automated Red-Teaming for the Safety of LLM-Based Search Agents [63.70653857721785]
We conduct two in-the-wild experiments to demonstrate the prevalence of low-quality search results and their potential to misguide agent behaviors.<n>To counter this threat, we introduce an automated red-teaming framework that is systematic, scalable, and cost-efficient.
arXiv Detail & Related papers (2025-09-28T07:05:17Z)
The Dark Side of LLMs: Agent-based Attacks for Complete Computer Takeover [0.0]
Large Language Model (LLM) agents and multi-agent systems introduce security vulnerabilities that extend beyond traditional content generation to system-level compromises.<n>This paper presents a comprehensive evaluation of the LLMs security used as reasoning engines within autonomous agents.<n>We show how different attack surfaces and trust boundaries can be leveraged to orchestrate such takeovers.
arXiv Detail & Related papers (2025-07-09T13:54:58Z)
Who's the Mole? Modeling and Detecting Intention-Hiding Malicious Agents in LLM-Based Multi-Agent Systems [25.6233463223145]
We study intention-hiding threats in multi-agent systems powered by Large Language Models (LLM-MAS)<n>We design four representative attack paradigms that subtly disrupt task completion while maintaining a high degree of stealth.<n>To counter these threats, we propose AgentXposed, a psychology-inspired detection framework.
arXiv Detail & Related papers (2025-07-07T07:34:34Z)
OS-Harm: A Benchmark for Measuring Safety of Computer Use Agents [60.78202583483591]
We introduce OS-Harm, a new benchmark for measuring safety of computer use agents.<n> OS-Harm is built on top of the OSWorld environment and aims to test models across three categories of harm: deliberate user misuse, prompt injection attacks, and model misbehavior.<n>We evaluate computer use agents based on a range of frontier models and provide insights into their safety.
arXiv Detail & Related papers (2025-06-17T17:59:31Z)
Among Us: A Sandbox for Measuring and Detecting Agentic Deception [1.1893676124374688]
We introduce $textitAmong Us$, a social deception game where language-based agents exhibit long-term, open-ended deception.<n>We find that models trained with RL are comparatively much better at producing deception than detecting it.<n>We also find two SAE features that work well at deception detection but are unable to steer the model to lie less.
arXiv Detail & Related papers (2025-04-05T06:09:32Z)
PsySafe: A Comprehensive Framework for Psychological-based Attack, Defense, and Evaluation of Multi-agent System Safety [70.84902425123406]
Multi-agent systems, when enhanced with Large Language Models (LLMs), exhibit profound capabilities in collective intelligence. However, the potential misuse of this intelligence for malicious purposes presents significant risks. We propose a framework (PsySafe) grounded in agent psychology, focusing on identifying how dark personality traits in agents can lead to risky behaviors. Our experiments reveal several intriguing phenomena, such as the collective dangerous behaviors among agents, agents' self-reflection when engaging in dangerous behavior, and the correlation between agents' psychological assessments and dangerous behaviors.
arXiv Detail & Related papers (2024-01-22T12:11:55Z)
Malicious Agent Detection for Robust Multi-Agent Collaborative Perception [52.261231738242266]
Multi-agent collaborative (MAC) perception is more vulnerable to adversarial attacks than single-agent perception. We propose Malicious Agent Detection (MADE), a reactive defense specific to MAC perception. We conduct comprehensive evaluations on a benchmark 3D dataset V2X-sim and a real-road dataset DAIR-V2X.
arXiv Detail & Related papers (2023-10-18T11:36:42Z)
Illusory Attacks: Information-Theoretic Detectability Matters in Adversarial Attacks [76.35478518372692]
We introduce epsilon-illusory, a novel form of adversarial attack on sequential decision-makers. Compared to existing attacks, we empirically find epsilon-illusory to be significantly harder to detect with automated methods. Our findings suggest the need for better anomaly detectors, as well as effective hardware- and system-level defenses.
arXiv Detail & Related papers (2022-07-20T19:49:09Z)

This list is automatically generated from the titles and abstracts of the papers in this site.