What Is AI Safety? What Do We Want It to Be?
- URL: http://arxiv.org/abs/2505.02313v1
- Date: Mon, 05 May 2025 01:55:00 GMT
- Title: What Is AI Safety? What Do We Want It to Be?
- Authors: Jacqueline Harding, Cameron Domenico Kirk-Giannini,
- Abstract summary: A research project falls within the purview of AI safety just in case it aims to prevent or reduce the harms caused by AI systems.<n>Despite its simplicity and appeal, we argue that The Safety Conception is in tension with at least two trends in the ways AI safety researchers and organizations think and talk about AI safety.
- Score: 0.0
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: The field of AI safety seeks to prevent or reduce the harms caused by AI systems. A simple and appealing account of what is distinctive of AI safety as a field holds that this feature is constitutive: a research project falls within the purview of AI safety just in case it aims to prevent or reduce the harms caused by AI systems. Call this appealingly simple account The Safety Conception of AI safety. Despite its simplicity and appeal, we argue that The Safety Conception is in tension with at least two trends in the ways AI safety researchers and organizations think and talk about AI safety: first, a tendency to characterize the goal of AI safety research in terms of catastrophic risks from future systems; second, the increasingly popular idea that AI safety can be thought of as a branch of safety engineering. Adopting the methodology of conceptual engineering, we argue that these trends are unfortunate: when we consider what concept of AI safety it would be best to have, there are compelling reasons to think that The Safety Conception is the answer. Descriptively, The Safety Conception allows us to see how work on topics that have historically been treated as central to the field of AI safety is continuous with work on topics that have historically been treated as more marginal, like bias, misinformation, and privacy. Normatively, taking The Safety Conception seriously means approaching all efforts to prevent or mitigate harms from AI systems based on their merits rather than drawing arbitrary distinctions between them.
Related papers
- The Singapore Consensus on Global AI Safety Research Priorities [128.58674892183657]
"2025 Singapore Conference on AI (SCAI): International Scientific Exchange on AI Safety" aimed to support research in this space.<n>Report builds on the International AI Safety Report chaired by Yoshua Bengio and backed by 33 governments.<n>Report organises AI safety research domains into three types: challenges with creating trustworthy AI systems (Development), challenges with evaluating their risks (Assessment) and challenges with monitoring and intervening after deployment (Control)
arXiv Detail & Related papers (2025-06-25T17:59:50Z) - AI threats to national security can be countered through an incident regime [55.2480439325792]
We propose a legally mandated post-deployment AI incident regime that aims to counter potential national security threats from AI systems.<n>Our proposed AI incident regime is split into three phases. The first phase revolves around a novel operationalization of what counts as an 'AI incident'<n>The second and third phases spell out that AI providers should notify a government agency about incidents, and that the government agency should be involved in amending AI providers' security and safety procedures.
arXiv Detail & Related papers (2025-03-25T17:51:50Z) - AI Safety for Everyone [3.440579243843689]
Recent discussions and research in AI safety have increasingly emphasized the deep connection between AI safety and existential risk from advanced AI systems.<n>This framing may exclude researchers and practitioners who are committed to AI safety but approach the field from different angles.<n>We find a vast array of concrete safety work that addresses immediate and practical concerns with current AI systems.
arXiv Detail & Related papers (2025-02-13T13:04:59Z) - Why do Experts Disagree on Existential Risk and P(doom)? A Survey of AI Experts [0.0]
Research on catastrophic risks and AI alignment is often met with skepticism by experts.<n>Online debate over the existential risk of AI has begun to turn tribal.<n>I surveyed 111 AI experts on their familiarity with AI safety concepts.
arXiv Detail & Related papers (2025-01-25T01:51:29Z) - Position: Mind the Gap-the Growing Disconnect Between Established Vulnerability Disclosure and AI Security [56.219994752894294]
We argue that adapting existing processes for AI security reporting is doomed to fail due to fundamental shortcomings for the distinctive characteristics of AI systems.<n>Based on our proposal to address these shortcomings, we discuss an approach to AI security reporting and how the new AI paradigm, AI agents, will further reinforce the need for specialized AI security incident reporting advancements.
arXiv Detail & Related papers (2024-12-19T13:50:26Z) - Trustworthy, Responsible, and Safe AI: A Comprehensive Architectural Framework for AI Safety with Challenges and Mitigations [15.946242944119385]
AI Safety is an emerging area of critical importance to the safe adoption and deployment of AI systems.<n>Our goal is to promote advancement in AI safety research, and ultimately enhance people's trust in digital transformation.
arXiv Detail & Related papers (2024-08-23T09:33:48Z) - Safetywashing: Do AI Safety Benchmarks Actually Measure Safety Progress? [59.96471873997733]
We propose an empirical foundation for developing more meaningful safety metrics and define AI safety in a machine learning research context.<n>We aim to provide a more rigorous framework for AI safety research, advancing the science of safety evaluations and clarifying the path towards measurable progress.
arXiv Detail & Related papers (2024-07-31T17:59:24Z) - AI Safety: A Climb To Armageddon? [0.0]
The paper examines three response strategies: Optimism, Mitigation, and Holism.
The surprising robustness of the argument forces a re-examination of core assumptions around AI safety.
arXiv Detail & Related papers (2024-05-30T08:41:54Z) - Towards Guaranteed Safe AI: A Framework for Ensuring Robust and Reliable AI Systems [88.80306881112313]
We will introduce and define a family of approaches to AI safety, which we will refer to as guaranteed safe (GS) AI.
The core feature of these approaches is that they aim to produce AI systems which are equipped with high-assurance quantitative safety guarantees.
We outline a number of approaches for creating each of these three core components, describe the main technical challenges, and suggest a number of potential solutions to them.
arXiv Detail & Related papers (2024-05-10T17:38:32Z) - AI Safety: Necessary, but insufficient and possibly problematic [1.6797508081737678]
This article critically examines the recent hype around AI safety.
We consider what 'AI safety' actually means, and outline the dominant concepts that the digital footprint of AI safety aligns with.
We share our concerns on how AI safety may normalize AI that advances structural harm through providing exploitative and harmful AI with a veneer of safety.
arXiv Detail & Related papers (2024-03-26T06:18:42Z) - Trustworthy AI: A Computational Perspective [54.80482955088197]
We focus on six of the most crucial dimensions in achieving trustworthy AI: (i) Safety & Robustness, (ii) Non-discrimination & Fairness, (iii) Explainability, (iv) Privacy, (v) Accountability & Auditability, and (vi) Environmental Well-Being.
For each dimension, we review the recent related technologies according to a taxonomy and summarize their applications in real-world systems.
arXiv Detail & Related papers (2021-07-12T14:21:46Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.