Deception and Manipulation in Generative AI
- URL: http://arxiv.org/abs/2401.11335v1
- Date: Sat, 20 Jan 2024 21:54:37 GMT
- Title: Deception and Manipulation in Generative AI
- Authors: Christian Tarsney
- Abstract summary: I argue that AI-generated content should be subject to stricter standards against deception and manipulation.
I propose two measures to guard against AI deception and manipulation.
- Score: 0.0
- License: http://creativecommons.org/licenses/by-nc-nd/4.0/
- Abstract: Large language models now possess human-level linguistic abilities in many
contexts. This raises the concern that they can be used to deceive and
manipulate on unprecedented scales, for instance spreading political
misinformation on social media. In future, agentic AI systems might also
deceive and manipulate humans for their own ends. In this paper, first, I argue
that AI-generated content should be subject to stricter standards against
deception and manipulation than we ordinarily apply to humans. Second, I offer
new characterizations of AI deception and manipulation meant to support such
standards, according to which a statement is deceptive (manipulative) if it
leads human addressees away from the beliefs (choices) they would endorse under
``semi-ideal'' conditions. Third, I propose two measures to guard against AI
deception and manipulation, inspired by this characterization: "extreme
transparency" requirements for AI-generated content and defensive systems that,
among other things, annotate AI-generated statements with contextualizing
information. Finally, I consider to what extent these measures can protect
against deceptive behavior in future, agentic AIs, and argue that non-agentic
defensive systems can provide an important layer of defense even against more
powerful agentic systems.
Related papers
- Human Bias in the Face of AI: The Role of Human Judgement in AI Generated Text Evaluation [48.70176791365903]
This study explores how bias shapes the perception of AI versus human generated content.
We investigated how human raters respond to labeled and unlabeled content.
arXiv Detail & Related papers (2024-09-29T04:31:45Z) - The Manipulation Problem: Conversational AI as a Threat to Epistemic
Agency [0.0]
The technology of Conversational AI has made significant advancements over the last eighteen months.
conversational agents are likely to be deployed in the near future that are designed to pursue targeted influence objectives.
Sometimes referred to as the "AI Manipulation Problem," the emerging risk is that consumers will unwittingly engage in real-time dialog with predatory AI agents.
arXiv Detail & Related papers (2023-06-19T04:09:16Z) - Characterizing Manipulation from AI Systems [7.344068411174193]
We build upon prior literature on manipulation from other fields and characterize the space of possible notions of manipulation.
We propose a definition of manipulation based on our characterization.
Third, we discuss the connections between manipulation and related concepts, such as deception and coercion.
arXiv Detail & Related papers (2023-03-16T15:19:21Z) - Artificial Influence: An Analysis Of AI-Driven Persuasion [0.0]
We warn that ubiquitous highlypersuasive AI systems could alter our information environment so significantly so as to contribute to a loss of human control of our own future.
We conclude that none of these solutions will be airtight, and that individuals and governments will need to take active steps to guard against the most pernicious effects of persuasive AI.
arXiv Detail & Related papers (2023-03-15T16:05:11Z) - When to Make Exceptions: Exploring Language Models as Accounts of Human
Moral Judgment [96.77970239683475]
AI systems need to be able to understand, interpret and predict human moral judgments and decisions.
A central challenge for AI safety is capturing the flexibility of the human moral mind.
We present a novel challenge set consisting of rule-breaking question answering.
arXiv Detail & Related papers (2022-10-04T09:04:27Z) - Cybertrust: From Explainable to Actionable and Interpretable AI (AI2) [58.981120701284816]
Actionable and Interpretable AI (AI2) will incorporate explicit quantifications and visualizations of user confidence in AI recommendations.
It will allow examining and testing of AI system predictions to establish a basis for trust in the systems' decision making.
arXiv Detail & Related papers (2022-01-26T18:53:09Z) - Truthful AI: Developing and governing AI that does not lie [0.26385121748044166]
Lying -- the use of verbal falsehoods to deceive -- is harmful.
While lying has traditionally been a human affair, AI systems are becoming increasingly prevalent.
This raises the question of how we should limit the harm caused by AI "lies"
arXiv Detail & Related papers (2021-10-13T12:18:09Z) - Trustworthy AI: A Computational Perspective [54.80482955088197]
We focus on six of the most crucial dimensions in achieving trustworthy AI: (i) Safety & Robustness, (ii) Non-discrimination & Fairness, (iii) Explainability, (iv) Privacy, (v) Accountability & Auditability, and (vi) Environmental Well-Being.
For each dimension, we review the recent related technologies according to a taxonomy and summarize their applications in real-world systems.
arXiv Detail & Related papers (2021-07-12T14:21:46Z) - The Threat of Offensive AI to Organizations [52.011307264694665]
This survey explores the threat of offensive AI on organizations.
First, we discuss how AI changes the adversary's methods, strategies, goals, and overall attack model.
Then, through a literature review, we identify 33 offensive AI capabilities which adversaries can use to enhance their attacks.
arXiv Detail & Related papers (2021-06-30T01:03:28Z) - On Adversarial Examples and Stealth Attacks in Artificial Intelligence
Systems [62.997667081978825]
We present a formal framework for assessing and analyzing two classes of malevolent action towards generic Artificial Intelligence (AI) systems.
The first class involves adversarial examples and concerns the introduction of small perturbations of the input data that cause misclassification.
The second class, introduced here for the first time and named stealth attacks, involves small perturbations to the AI system itself.
arXiv Detail & Related papers (2020-04-09T10:56:53Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.