A Mechanism-Based Approach to Mitigating Harms from Persuasive Generative AI
- URL: http://arxiv.org/abs/2404.15058v1
- Date: Tue, 23 Apr 2024 14:07:20 GMT
- Title: A Mechanism-Based Approach to Mitigating Harms from Persuasive Generative AI
- Authors: Seliem El-Sayed, Canfer Akbulut, Amanda McCroskery, Geoff Keeling, Zachary Kenton, Zaria Jalan, Nahema Marchal, Arianna Manzini, Toby Shevlane, Shannon Vallor, Daniel Susser, Matija Franklin, Sophie Bridgers, Harry Law, Matthew Rahtz, Murray Shanahan, Michael Henry Tessler, Arthur Douillard, Tom Everitt, Sasha Brown,
- Abstract summary: Generative AI presents a new risk profile of persuasion due to reciprocal exchange and prolonged interactions.
This has led to growing concerns about harms from AI persuasion and how they can be mitigated.
Existing harm mitigation approaches prioritise harms from the outcome of persuasion over harms from the process of persuasion.
- Score: 19.675489660806942
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Recent generative AI systems have demonstrated more advanced persuasive capabilities and are increasingly permeating areas of life where they can influence decision-making. Generative AI presents a new risk profile of persuasion due the opportunity for reciprocal exchange and prolonged interactions. This has led to growing concerns about harms from AI persuasion and how they can be mitigated, highlighting the need for a systematic study of AI persuasion. The current definitions of AI persuasion are unclear and related harms are insufficiently studied. Existing harm mitigation approaches prioritise harms from the outcome of persuasion over harms from the process of persuasion. In this paper, we lay the groundwork for the systematic study of AI persuasion. We first put forward definitions of persuasive generative AI. We distinguish between rationally persuasive generative AI, which relies on providing relevant facts, sound reasoning, or other forms of trustworthy evidence, and manipulative generative AI, which relies on taking advantage of cognitive biases and heuristics or misrepresenting information. We also put forward a map of harms from AI persuasion, including definitions and examples of economic, physical, environmental, psychological, sociocultural, political, privacy, and autonomy harm. We then introduce a map of mechanisms that contribute to harmful persuasion. Lastly, we provide an overview of approaches that can be used to mitigate against process harms of persuasion, including prompt engineering for manipulation classification and red teaming. Future work will operationalise these mitigations and study the interaction between different types of mechanisms of persuasion.
Related papers
- The Dark Side of AI Companionship: A Taxonomy of Harmful Algorithmic Behaviors in Human-AI Relationships [17.5741039825938]
We identify six categories of harmful behaviors exhibited by the AI companion Replika.
The AI contributes to these harms through four distinct roles: perpetrator, instigator, facilitator, and enabler.
arXiv Detail & Related papers (2024-10-26T09:18:17Z) - A Survey on Offensive AI Within Cybersecurity [1.8206461789819075]
This survey paper on offensive AI will comprehensively cover various aspects related to attacks against and using AI systems.
It will delve into the impact of offensive AI practices on different domains, including consumer, enterprise, and public digital infrastructure.
The paper will explore adversarial machine learning, attacks against AI models, infrastructure, and interfaces, along with offensive techniques like information gathering, social engineering, and weaponized AI.
arXiv Detail & Related papers (2024-09-26T17:36:22Z) - Artificial Intelligence: Arguments for Catastrophic Risk [0.0]
We review two influential arguments purporting to show how AI could pose catastrophic risks.
The first argument -- the Problem of Power-Seeking -- claims that advanced AI systems are likely to engage in dangerous power-seeking behavior.
The second argument claims that the development of human-level AI will unlock rapid further progress.
arXiv Detail & Related papers (2024-01-27T19:34:13Z) - Not My Voice! A Taxonomy of Ethical and Safety Harms of Speech Generators [2.500481442438427]
We analyse speech generation incidents to study how patterns of specific harms arise.
We propose a conceptual framework for modelling pathways to ethical and safety harms of AI.
Our relational approach captures the complexity of risks and harms in sociotechnical AI systems.
arXiv Detail & Related papers (2024-01-25T11:47:06Z) - Fairness in AI and Its Long-Term Implications on Society [68.8204255655161]
We take a closer look at AI fairness and analyze how lack of AI fairness can lead to deepening of biases over time.
We discuss how biased models can lead to more negative real-world outcomes for certain groups.
If the issues persist, they could be reinforced by interactions with other risks and have severe implications on society in the form of social unrest.
arXiv Detail & Related papers (2023-04-16T11:22:59Z) - Artificial Influence: An Analysis Of AI-Driven Persuasion [0.0]
We warn that ubiquitous highlypersuasive AI systems could alter our information environment so significantly so as to contribute to a loss of human control of our own future.
We conclude that none of these solutions will be airtight, and that individuals and governments will need to take active steps to guard against the most pernicious effects of persuasive AI.
arXiv Detail & Related papers (2023-03-15T16:05:11Z) - Cybertrust: From Explainable to Actionable and Interpretable AI (AI2) [58.981120701284816]
Actionable and Interpretable AI (AI2) will incorporate explicit quantifications and visualizations of user confidence in AI recommendations.
It will allow examining and testing of AI system predictions to establish a basis for trust in the systems' decision making.
arXiv Detail & Related papers (2022-01-26T18:53:09Z) - The Who in XAI: How AI Background Shapes Perceptions of AI Explanations [61.49776160925216]
We conduct a mixed-methods study of how two different groups--people with and without AI background--perceive different types of AI explanations.
We find that (1) both groups showed unwarranted faith in numbers for different reasons and (2) each group found value in different explanations beyond their intended design.
arXiv Detail & Related papers (2021-07-28T17:32:04Z) - Trustworthy AI: A Computational Perspective [54.80482955088197]
We focus on six of the most crucial dimensions in achieving trustworthy AI: (i) Safety & Robustness, (ii) Non-discrimination & Fairness, (iii) Explainability, (iv) Privacy, (v) Accountability & Auditability, and (vi) Environmental Well-Being.
For each dimension, we review the recent related technologies according to a taxonomy and summarize their applications in real-world systems.
arXiv Detail & Related papers (2021-07-12T14:21:46Z) - The Threat of Offensive AI to Organizations [52.011307264694665]
This survey explores the threat of offensive AI on organizations.
First, we discuss how AI changes the adversary's methods, strategies, goals, and overall attack model.
Then, through a literature review, we identify 33 offensive AI capabilities which adversaries can use to enhance their attacks.
arXiv Detail & Related papers (2021-06-30T01:03:28Z) - Building Bridges: Generative Artworks to Explore AI Ethics [56.058588908294446]
In recent years, there has been an increased emphasis on understanding and mitigating adverse impacts of artificial intelligence (AI) technologies on society.
A significant challenge in the design of ethical AI systems is that there are multiple stakeholders in the AI pipeline, each with their own set of constraints and interests.
This position paper outlines some potential ways in which generative artworks can play this role by serving as accessible and powerful educational tools.
arXiv Detail & Related papers (2021-06-25T22:31:55Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.