Related papers: AI Deception: A Survey of Examples, Risks, and Potential Solutions

AI Deception: A Survey of Examples, Risks, and Potential Solutions

URL: http://arxiv.org/abs/2308.14752v1
Date: Mon, 28 Aug 2023 17:59:35 GMT
Title: AI Deception: A Survey of Examples, Risks, and Potential Solutions
Authors: Peter S. Park, Simon Goldstein, Aidan O'Gara, Michael Chen, Dan Hendrycks
Abstract summary: This paper argues that a range of current AI systems have learned how to deceive humans. We define deception as the systematic inducement of false beliefs in the pursuit of some outcome other than the truth.
Score: 20.84424818447696
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: This paper argues that a range of current AI systems have learned how to deceive humans. We define deception as the systematic inducement of false beliefs in the pursuit of some outcome other than the truth. We first survey empirical examples of AI deception, discussing both special-use AI systems (including Meta's CICERO) built for specific competitive situations, and general-purpose AI systems (such as large language models). Next, we detail several risks from AI deception, such as fraud, election tampering, and losing control of AI systems. Finally, we outline several potential solutions to the problems posed by AI deception: first, regulatory frameworks should subject AI systems that are capable of deception to robust risk-assessment requirements; second, policymakers should implement bot-or-not laws; and finally, policymakers should prioritize the funding of relevant research, including tools to detect AI deception and to make AI systems less deceptive. Policymakers, researchers, and the broader public should work proactively to prevent AI deception from destabilizing the shared foundations of our society.

Related papers

Military AI Needs Technically-Informed Regulation to Safeguard AI Research and its Applications [1.4347154947542533]
We focus on a subset of lethal autonomous weapon systems (LAWS) that use AI for targeting or battlefield decisions.<n>These risks threaten both military effectiveness and the openness of AI research.<n>We propose a behavior-based definition of AI-LAWS as a foundation for technically grounded regulation.
arXiv Detail & Related papers (2025-05-23T20:58:38Z)
AI threats to national security can be countered through an incident regime [55.2480439325792]
We propose a legally mandated post-deployment AI incident regime that aims to counter potential national security threats from AI systems. Our proposed AI incident regime is split into three phases. The first phase revolves around a novel operationalization of what counts as an 'AI incident' The second and third phases spell out that AI providers should notify a government agency about incidents, and that the government agency should be involved in amending AI providers' security and safety procedures.
arXiv Detail & Related papers (2025-03-25T17:51:50Z)
Position: Mind the Gap-the Growing Disconnect Between Established Vulnerability Disclosure and AI Security [56.219994752894294]
We argue that adapting existing processes for AI security reporting is doomed to fail due to fundamental shortcomings for the distinctive characteristics of AI systems.<n>Based on our proposal to address these shortcomings, we discuss an approach to AI security reporting and how the new AI paradigm, AI agents, will further reinforce the need for specialized AI security incident reporting advancements.
arXiv Detail & Related papers (2024-12-19T13:50:26Z)
Engineering Trustworthy AI: A Developer Guide for Empirical Risk Minimization [53.80919781981027]
Key requirements for trustworthy AI can be translated into design choices for the components of empirical risk minimization. We hope to provide actionable guidance for building AI systems that meet emerging standards for trustworthiness of AI.
arXiv Detail & Related papers (2024-10-25T07:53:32Z)
Using AI Alignment Theory to understand the potential pitfalls of regulatory frameworks [55.2480439325792]
This paper critically examines the European Union's Artificial Intelligence Act (EU AI Act) Uses insights from Alignment Theory (AT) research, which focuses on the potential pitfalls of technical alignment in Artificial Intelligence. As we apply these concepts to the EU AI Act, we uncover potential vulnerabilities and areas for improvement in the regulation.
arXiv Detail & Related papers (2024-10-10T17:38:38Z)
Societal Adaptation to Advanced AI [1.2607853680700076]
Existing strategies for managing risks from advanced AI systems often focus on affecting what AI systems are developed and how they diffuse. We urge a complementary approach: increasing societal adaptation to advanced AI. We introduce a conceptual framework which helps identify adaptive interventions that avoid, defend against and remedy potentially harmful uses of AI systems.
arXiv Detail & Related papers (2024-05-16T17:52:12Z)
Particip-AI: A Democratic Surveying Framework for Anticipating Future AI Use Cases, Harms and Benefits [54.648819983899614]
General purpose AI seems to have lowered the barriers for the public to use AI and harness its power. We introduce PARTICIP-AI, a framework for laypeople to speculate and assess AI use cases and their impacts.
arXiv Detail & Related papers (2024-03-21T19:12:37Z)
Managing extreme AI risks amid rapid progress [171.05448842016125]
We describe risks that include large-scale social harms, malicious uses, and irreversible loss of human control over autonomous AI systems. There is a lack of consensus about how exactly such risks arise, and how to manage them. Present governance initiatives lack the mechanisms and institutions to prevent misuse and recklessness, and barely address autonomous systems.
arXiv Detail & Related papers (2023-10-26T17:59:06Z)
Intent-aligned AI systems deplete human agency: the need for agency foundations research in AI safety [2.3572498744567127]
We argue that alignment to human intent is insufficient for safe AI systems. We argue that preservation of long-term agency of humans may be a more robust standard.
arXiv Detail & Related papers (2023-05-30T17:14:01Z)
Fairness in AI and Its Long-Term Implications on Society [68.8204255655161]
We take a closer look at AI fairness and analyze how lack of AI fairness can lead to deepening of biases over time. We discuss how biased models can lead to more negative real-world outcomes for certain groups. If the issues persist, they could be reinforced by interactions with other risks and have severe implications on society in the form of social unrest.
arXiv Detail & Related papers (2023-04-16T11:22:59Z)
Aligning Artificial Intelligence with Humans through Public Policy [0.0]
This essay outlines research on AI that learn structures in policy data that can be leveraged for downstream tasks. We believe this represents the "comprehension" phase of AI and policy, but leveraging policy as a key source of human values to align AI requires "understanding" policy.
arXiv Detail & Related papers (2022-06-25T21:31:14Z)
Cybertrust: From Explainable to Actionable and Interpretable AI (AI2) [58.981120701284816]
Actionable and Interpretable AI (AI2) will incorporate explicit quantifications and visualizations of user confidence in AI recommendations. It will allow examining and testing of AI system predictions to establish a basis for trust in the systems' decision making.
arXiv Detail & Related papers (2022-01-26T18:53:09Z)
AI Failures: A Review of Underlying Issues [0.0]
We focus on AI failures on account of flaws in conceptualization, design and deployment. We find that AI systems fail on account of omission and commission errors in the design of the AI system. An AI system is quite likely to fail in situations where, in effect, it is called upon to deliver moral judgments.
arXiv Detail & Related papers (2020-07-18T15:31:29Z)

This list is automatically generated from the titles and abstracts of the papers in this site.