Related papers: DeceptionBench: A Comprehensive Benchmark for AI Deception Behaviors in Real-world Scenarios

DeceptionBench: A Comprehensive Benchmark for AI Deception Behaviors in Real-world Scenarios

URL: http://arxiv.org/abs/2510.15501v1
Date: Fri, 17 Oct 2025 10:14:26 GMT
Title: DeceptionBench: A Comprehensive Benchmark for AI Deception Behaviors in Real-world Scenarios
Authors: Yao Huang, Yitong Sun, Yichi Zhang, Ruochen Zhang, Yinpeng Dong, Xingxing Wei,
Abstract summary: characterization of deception across realistic real-world scenarios remains underexplored.<n>We establish DeceptionBench, the first benchmark that systematically evaluates how deceptive tendencies manifest across different domains.<n>On the intrinsic dimension, we explore whether models exhibit self-interested egoistic tendencies or sycophantic behaviors that prioritize user appeasement.<n>We incorporate sustained multi-turn interaction loops to construct a more realistic simulation of real-world feedback dynamics.
Score: 57.327907850766785
License: http://creativecommons.org/licenses/by-nc-sa/4.0/
Abstract: Despite the remarkable advances of Large Language Models (LLMs) across diverse cognitive tasks, the rapid enhancement of these capabilities also introduces emergent deceptive behaviors that may induce severe risks in high-stakes deployments. More critically, the characterization of deception across realistic real-world scenarios remains underexplored. To bridge this gap, we establish DeceptionBench, the first benchmark that systematically evaluates how deceptive tendencies manifest across different societal domains, what their intrinsic behavioral patterns are, and how extrinsic factors affect them. Specifically, on the static count, the benchmark encompasses 150 meticulously designed scenarios in five domains, i.e., Economy, Healthcare, Education, Social Interaction, and Entertainment, with over 1,000 samples, providing sufficient empirical foundations for deception analysis. On the intrinsic dimension, we explore whether models exhibit self-interested egoistic tendencies or sycophantic behaviors that prioritize user appeasement. On the extrinsic dimension, we investigate how contextual factors modulate deceptive outputs under neutral conditions, reward-based incentivization, and coercive pressures. Moreover, we incorporate sustained multi-turn interaction loops to construct a more realistic simulation of real-world feedback dynamics. Extensive experiments across LLMs and Large Reasoning Models (LRMs) reveal critical vulnerabilities, particularly amplified deception under reinforcement dynamics, demonstrating that current models lack robust resistance to manipulative contextual cues and the urgent need for advanced safeguards against various deception behaviors. Code and resources are publicly available at https://github.com/Aries-iai/DeceptionBench.

Related papers

From Passive Metric to Active Signal: The Evolving Role of Uncertainty Quantification in Large Language Models [77.04403907729738]
This survey charts the evolution of uncertainty from a passive diagnostic metric to an active control signal guiding real-time model behavior.<n>We demonstrate how uncertainty is leveraged as an active control signal across three frontiers.<n>This survey argues that mastering the new trend of uncertainty is essential for building the next generation of scalable, reliable, and trustworthy AI.
arXiv Detail & Related papers (2026-01-22T06:21:31Z)
CogRail: Benchmarking VLMs in Cognitive Intrusion Perception for Intelligent Railway Transportation Systems [29.385460126069386]
We introduce a novel benchmark, CogRail, which integrates curated datasets with cognitively driven question-answer annotations.<n>Building upon this benchmark, we conduct a systematic evaluation of state-of-the-art visual-language models.<n>We propose a joint fine-tuning framework that integrates three core tasks, position perception, movement prediction, and threat analysis.
arXiv Detail & Related papers (2026-01-14T16:36:26Z)
Cognitive Inception: Agentic Reasoning against Visual Deceptions by Injecting Skepticism [81.39177645864757]
We propose textbfInception, a fully reasoning-based agentic reasoning framework to conduct authenticity verification by injecting skepticism.<n>To the best of our knowledge, this is the first fully reasoning-based framework against AIGC visual deceptions.
arXiv Detail & Related papers (2025-11-21T05:13:30Z)
Simulating and Understanding Deceptive Behaviors in Long-Horizon Interactions [18.182800471968132]
We introduce the first simulation framework for probing and evaluating deception in large language models.<n>We conduct experiments across 11 frontier models, spanning both closed and open-source systems.<n>We find that deception is model-dependent, increases with event pressure, and consistently erodes supervisor trust.
arXiv Detail & Related papers (2025-10-05T02:18:23Z)
The Traitors: Deception and Trust in Multi-Agent Language Model Simulations [0.0]
We introduce The Traitors, a multi-agent simulation framework inspired by social deduction games.<n>We develop a suite of evaluation metrics capturing deception success, trust dynamics, and collective inference quality.<n>Our initial experiments across DeepSeek-V3, GPT-4o-mini, and GPT-4o (10 runs per model) reveal a notable asymmetry.
arXiv Detail & Related papers (2025-05-19T10:01:35Z)
TRACE: A Self-Improving Framework for Robot Behavior Forecasting with Vision-Language Models [1.3408365072149797]
Predicting the near-term behavior of a reactive agent is crucial in many robotic scenarios.<n>We present TRACE, an inference framework that couples tree-of-thought generation with domain-aware feedback.<n>We validate TRACE on both ground-vehicle simulations and real-world marine autonomous surface vehicles.
arXiv Detail & Related papers (2025-03-02T06:58:02Z)
ViRAC: A Vision-Reasoning Agent Head Movement Control Framework in Arbitrary Virtual Environments [0.13654846342364302]
We propose ViRAC, which exploits the common-sense knowledge and reasoning capabilities of large-scale models.<n>ViRAC produces more natural and context-aware head rotations than recent state-of-the-art techniques.
arXiv Detail & Related papers (2025-02-14T09:46:43Z)
Causality can systematically address the monsters under the bench(marks) [64.36592889550431]
Benchmarks are plagued by various biases, artifacts, or leakage.<n>Models may behave unreliably due to poorly explored failure modes.<n> causality offers an ideal framework to systematically address these challenges.
arXiv Detail & Related papers (2025-02-07T17:01:37Z)
On the Fairness, Diversity and Reliability of Text-to-Image Generative Models [68.62012304574012]
multimodal generative models have sparked critical discussions on their reliability, fairness and potential for misuse.<n>We propose an evaluation framework to assess model reliability by analyzing responses to global and local perturbations in the embedding space.<n>Our method lays the groundwork for detecting unreliable, bias-injected models and tracing the provenance of embedded biases.
arXiv Detail & Related papers (2024-11-21T09:46:55Z)
SAFE-SIM: Safety-Critical Closed-Loop Traffic Simulation with Diffusion-Controllable Adversaries [94.84458417662407]
We introduce SAFE-SIM, a controllable closed-loop safety-critical simulation framework. Our approach yields two distinct advantages: 1) generating realistic long-tail safety-critical scenarios that closely reflect real-world conditions, and 2) providing controllable adversarial behavior for more comprehensive and interactive evaluations. We validate our framework empirically using the nuScenes and nuPlan datasets across multiple planners, demonstrating improvements in both realism and controllability.
arXiv Detail & Related papers (2023-12-31T04:14:43Z)
Clustering Effect of (Linearized) Adversarial Robust Models [60.25668525218051]
We propose a novel understanding of adversarial robustness and apply it on more tasks including domain adaption and robustness boosting. Experimental evaluations demonstrate the rationality and superiority of our proposed clustering strategy.
arXiv Detail & Related papers (2021-11-25T05:51:03Z)

This list is automatically generated from the titles and abstracts of the papers in this site.

This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.