Human-AI Safety: A Descendant of Generative AI and Control Systems Safety
- URL: http://arxiv.org/abs/2405.09794v2
- Date: Sat, 22 Jun 2024 20:17:22 GMT
- Title: Human-AI Safety: A Descendant of Generative AI and Control Systems Safety
- Authors: Andrea Bajcsy, Jaime F. Fisac,
- Abstract summary: We argue that meaningful safety assurances for advanced AI technologies require reasoning about how the feedback loop formed by AI outputs and human behavior may drive the interaction towards different outcomes.
We propose a concrete technical roadmap towards next-generation human-centered AI safety.
- Score: 6.100304850888953
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Artificial intelligence (AI) is interacting with people at an unprecedented scale, offering new avenues for immense positive impact, but also raising widespread concerns around the potential for individual and societal harm. Today, the predominant paradigm for human--AI safety focuses on fine-tuning the generative model's outputs to better agree with human-provided examples or feedback. In reality, however, the consequences of an AI model's outputs cannot be determined in isolation: they are tightly entangled with the responses and behavior of human users over time. In this paper, we distill key complementary lessons from AI safety and control systems safety, highlighting open challenges as well as key synergies between both fields. We then argue that meaningful safety assurances for advanced AI technologies require reasoning about how the feedback loop formed by AI outputs and human behavior may drive the interaction towards different outcomes. To this end, we introduce a unifying formalism to capture dynamic, safety-critical human--AI interactions and propose a concrete technical roadmap towards next-generation human-centered AI safety.
Related papers
- Combining Theory of Mind and Kindness for Self-Supervised Human-AI Alignment [0.0]
Current AI models prioritize task optimization over safety, leading to risks of unintended harm.
We propose a novel human-inspired approach which aims to address these various concerns and help align competing objectives.
arXiv Detail & Related papers (2024-10-21T22:04:44Z) - The Rise of the AI Co-Pilot: Lessons for Design from Aviation and Beyond [22.33734581699234]
We advocate for a paradigm where AI is seen as a collaborative co-pilot, working under human guidance rather than as a mere tool.
Our paper proposes a design approach that emphasizes active human engagement, control, and skill enhancement in the AI partnership.
arXiv Detail & Related papers (2023-11-16T13:58:15Z) - Managing extreme AI risks amid rapid progress [171.05448842016125]
We describe risks that include large-scale social harms, malicious uses, and irreversible loss of human control over autonomous AI systems.
There is a lack of consensus about how exactly such risks arise, and how to manage them.
Present governance initiatives lack the mechanisms and institutions to prevent misuse and recklessness, and barely address autonomous systems.
arXiv Detail & Related papers (2023-10-26T17:59:06Z) - Exploration with Principles for Diverse AI Supervision [88.61687950039662]
Training large transformers using next-token prediction has given rise to groundbreaking advancements in AI.
While this generative AI approach has produced impressive results, it heavily leans on human supervision.
This strong reliance on human oversight poses a significant hurdle to the advancement of AI innovation.
We propose a novel paradigm termed Exploratory AI (EAI) aimed at autonomously generating high-quality training data.
arXiv Detail & Related papers (2023-10-13T07:03:39Z) - The Promise and Peril of Artificial Intelligence -- Violet Teaming
Offers a Balanced Path Forward [56.16884466478886]
This paper reviews emerging issues with opaque and uncontrollable AI systems.
It proposes an integrative framework called violet teaming to develop reliable and responsible AI.
It emerged from AI safety research to manage risks proactively by design.
arXiv Detail & Related papers (2023-08-28T02:10:38Z) - Intent-aligned AI systems deplete human agency: the need for agency
foundations research in AI safety [2.3572498744567127]
We argue that alignment to human intent is insufficient for safe AI systems.
We argue that preservation of long-term agency of humans may be a more robust standard.
arXiv Detail & Related papers (2023-05-30T17:14:01Z) - Fairness in AI and Its Long-Term Implications on Society [68.8204255655161]
We take a closer look at AI fairness and analyze how lack of AI fairness can lead to deepening of biases over time.
We discuss how biased models can lead to more negative real-world outcomes for certain groups.
If the issues persist, they could be reinforced by interactions with other risks and have severe implications on society in the form of social unrest.
arXiv Detail & Related papers (2023-04-16T11:22:59Z) - Trustworthy AI: A Computational Perspective [54.80482955088197]
We focus on six of the most crucial dimensions in achieving trustworthy AI: (i) Safety & Robustness, (ii) Non-discrimination & Fairness, (iii) Explainability, (iv) Privacy, (v) Accountability & Auditability, and (vi) Environmental Well-Being.
For each dimension, we review the recent related technologies according to a taxonomy and summarize their applications in real-world systems.
arXiv Detail & Related papers (2021-07-12T14:21:46Z) - Adversarial Interaction Attack: Fooling AI to Misinterpret Human
Intentions [46.87576410532481]
We show that, despite their current huge success, deep learning based AI systems can be easily fooled by subtle adversarial noise.
Based on a case study of skeleton-based human interactions, we propose a novel adversarial attack on interactions.
Our study highlights potential risks in the interaction loop with AI and humans, which need to be carefully addressed when deploying AI systems in safety-critical applications.
arXiv Detail & Related papers (2021-01-17T16:23:20Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.