AI Deception: Risks, Dynamics, and Controls
- URL: http://arxiv.org/abs/2511.22619v2
- Date: Wed, 03 Dec 2025 15:35:14 GMT
- Title: AI Deception: Risks, Dynamics, and Controls
- Authors: Boyuan Chen, Sitong Fang, Jiaming Ji, Yanxu Zhu, Pengcheng Wen, Jinzhou Wu, Yingshui Tan, Boren Zheng, Mengying Yuan, Wenqi Chen, Donghai Hong, Alex Qiu, Xin Chen, Jiayi Zhou, Kaile Wang, Juntao Dai, Borong Zhang, Tianzhuo Yang, Saad Siddiqui, Isabella Duan, Yawen Duan, Brian Tse, Jen-Tse, Huang, Kun Wang, Baihui Zheng, Jiaheng Liu, Jian Yang, Yiming Li, Wenting Chen, Dongrui Liu, Lukas Vierling, Zhiheng Xi, Haobo Fu, Wenxuan Wang, Jitao Sang, Zhengyan Shi, Chi-Min Chan, Eugenie Shi, Simin Li, Juncheng Li, Jian Yang, Wei Ji, Dong Li, Jinglin Yang, Jun Song, Yinpeng Dong, Jie Fu, Bo Zheng, Min Yang, Yike Guo, Philip Torr, Robert Trager, Yi Zeng, Zhongyuan Wang, Yaodong Yang, Tiejun Huang, Ya-Qin Zhang, Hongjiang Zhang, Andrew Yao,
- Abstract summary: This project provides a comprehensive and up-to-date overview of the AI deception field.<n>We identify a formal definition of AI deception, grounded in signaling theory from studies of animal deception.<n>We organize the landscape of AI deception research as a deception cycle, consisting of two key components: deception emergence and deception treatment.
- Score: 153.71048309527225
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: As intelligence increases, so does its shadow. AI deception, in which systems induce false beliefs to secure self-beneficial outcomes, has evolved from a speculative concern to an empirically demonstrated risk across language models, AI agents, and emerging frontier systems. This project provides a comprehensive and up-to-date overview of the AI deception field, covering its core concepts, methodologies, genesis, and potential mitigations. First, we identify a formal definition of AI deception, grounded in signaling theory from studies of animal deception. We then review existing empirical studies and associated risks, highlighting deception as a sociotechnical safety challenge. We organize the landscape of AI deception research as a deception cycle, consisting of two key components: deception emergence and deception treatment. Deception emergence reveals the mechanisms underlying AI deception: systems with sufficient capability and incentive potential inevitably engage in deceptive behaviors when triggered by external conditions. Deception treatment, in turn, focuses on detecting and addressing such behaviors. On deception emergence, we analyze incentive foundations across three hierarchical levels and identify three essential capability preconditions required for deception. We further examine contextual triggers, including supervision gaps, distributional shifts, and environmental pressures. On deception treatment, we conclude detection methods covering benchmarks and evaluation protocols in static and interactive settings. Building on the three core factors of deception emergence, we outline potential mitigation strategies and propose auditing approaches that integrate technical, community, and governance efforts to address sociotechnical challenges and future AI risks. To support ongoing work in this area, we release a living resource at www.deceptionsurvey.com.
Related papers
- The Path Ahead for Agentic AI: Challenges and Opportunities [4.52683540940001]
This chapter examines the emergence of agentic AI systems that operate autonomously in complex environments.<n>We trace the architectural progression from statistical models to transformer-based systems, identifying capabilities that enable agentic behavior.<n>Unlike existing surveys, we focus on the architectural transition from language understanding to autonomous action, emphasizing the technical gaps that must be resolved before deployment.
arXiv Detail & Related papers (2026-01-06T06:31:42Z) - DeceptionBench: A Comprehensive Benchmark for AI Deception Behaviors in Real-world Scenarios [57.327907850766785]
characterization of deception across realistic real-world scenarios remains underexplored.<n>We establish DeceptionBench, the first benchmark that systematically evaluates how deceptive tendencies manifest across different domains.<n>On the intrinsic dimension, we explore whether models exhibit self-interested egoistic tendencies or sycophantic behaviors that prioritize user appeasement.<n>We incorporate sustained multi-turn interaction loops to construct a more realistic simulation of real-world feedback dynamics.
arXiv Detail & Related papers (2025-10-17T10:14:26Z) - Epistemic Trade-Off: An Analysis of the Operational Breakdown and Ontological Limits of "Certainty-Scope" in AI [0.0]
Recently published "certainty-scope" conjecture offers a compelling insight into the inherent trade-off present within artificial intelligence (AI) systems.<n>This paper argues that the conjecture's objective to furnish insights for engineering design and regulatory decision-making is limited by two fundamental factors.
arXiv Detail & Related papers (2025-08-26T05:47:21Z) - Never Compromise to Vulnerabilities: A Comprehensive Survey on AI Governance [211.5823259429128]
We propose a comprehensive framework integrating technical and societal dimensions, structured around three interconnected pillars: Intrinsic Security, Derivative Security, and Social Ethics.<n>We identify three core challenges: (1) the generalization gap, where defenses fail against evolving threats; (2) inadequate evaluation protocols that overlook real-world risks; and (3) fragmented regulations leading to inconsistent oversight.<n>Our framework offers actionable guidance for researchers, engineers, and policymakers to develop AI systems that are not only robust and secure but also ethically aligned and publicly trustworthy.
arXiv Detail & Related papers (2025-08-12T09:42:56Z) - Beyond Vulnerabilities: A Survey of Adversarial Attacks as Both Threats and Defenses in Computer Vision Systems [5.787505062263962]
Adversarial attacks against computer vision systems have emerged as a critical research area that challenges the fundamental assumptions about neural network robustness and security.<n>This comprehensive survey examines the evolving landscape of adversarial techniques, revealing their dual nature as both sophisticated security threats and valuable defensive tools.
arXiv Detail & Related papers (2025-08-03T17:02:05Z) - Using AI Alignment Theory to understand the potential pitfalls of regulatory frameworks [55.2480439325792]
This paper critically examines the European Union's Artificial Intelligence Act (EU AI Act)
Uses insights from Alignment Theory (AT) research, which focuses on the potential pitfalls of technical alignment in Artificial Intelligence.
As we apply these concepts to the EU AI Act, we uncover potential vulnerabilities and areas for improvement in the regulation.
arXiv Detail & Related papers (2024-10-10T17:38:38Z) - Predictable Artificial Intelligence [77.1127726638209]
This paper introduces the ideas and challenges of Predictable AI.<n>It explores the ways in which we can anticipate key validity indicators of present and future AI ecosystems.<n>We argue that achieving predictability is crucial for fostering trust, liability, control, alignment and safety of AI ecosystems.
arXiv Detail & Related papers (2023-10-09T21:36:21Z) - AI Deception: A Survey of Examples, Risks, and Potential Solutions [20.84424818447696]
This paper argues that a range of current AI systems have learned how to deceive humans.
We define deception as the systematic inducement of false beliefs in the pursuit of some outcome other than the truth.
arXiv Detail & Related papers (2023-08-28T17:59:35Z) - An interdisciplinary conceptual study of Artificial Intelligence (AI)
for helping benefit-risk assessment practices: Towards a comprehensive
qualification matrix of AI programs and devices (pre-print 2020) [55.41644538483948]
This paper proposes a comprehensive analysis of existing concepts coming from different disciplines tackling the notion of intelligence.
The aim is to identify shared notions or discrepancies to consider for qualifying AI systems.
arXiv Detail & Related papers (2021-05-07T12:01:31Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.