Related papers: Dynamic Normativity: Necessary and Sufficient Conditions for Value Alignment

Dynamic Normativity: Necessary and Sufficient Conditions for Value Alignment

URL: http://arxiv.org/abs/2406.11039v2
Date: Tue, 18 Jun 2024 12:15:06 GMT
Title: Dynamic Normativity: Necessary and Sufficient Conditions for Value Alignment
Authors: Nicholas Kluge Corrêa,
Abstract summary: We find "alignment" a problem related to the challenges of expressing human goals and values in a manner that artificial systems can follow without leading to unwanted adversarial effects. This work addresses alignment as a technical-philosophical problem that requires solid philosophical foundations and practical implementations that bring normative theory to AI system development.
Score: 0.0
License: http://creativecommons.org/licenses/by-nc-sa/4.0/
Abstract: The critical inquiry pervading the realm of Philosophy, and perhaps extending its influence across all Humanities disciplines, revolves around the intricacies of morality and normativity. Surprisingly, in recent years, this thematic thread has woven its way into an unexpected domain, one not conventionally associated with pondering "what ought to be": the field of artificial intelligence (AI) research. Central to morality and AI, we find "alignment", a problem related to the challenges of expressing human goals and values in a manner that artificial systems can follow without leading to unwanted adversarial effects. More explicitly and with our current paradigm of AI development in mind, we can think of alignment as teaching human values to non-anthropomorphic entities trained through opaque, gradient-based learning techniques. This work addresses alignment as a technical-philosophical problem that requires solid philosophical foundations and practical implementations that bring normative theory to AI system development. To accomplish this, we propose two sets of necessary and sufficient conditions that, we argue, should be considered in any alignment process. While necessary conditions serve as metaphysical and metaethical roots that pertain to the permissibility of alignment, sufficient conditions establish a blueprint for aligning AI systems under a learning-based paradigm. After laying such foundations, we present implementations of this approach by using state-of-the-art techniques and methods for aligning general-purpose language systems. We call this framework Dynamic Normativity. Its central thesis is that any alignment process under a learning paradigm that cannot fulfill its necessary and sufficient conditions will fail in producing aligned systems.

Related papers

Technology as uncharted territory: Contextual integrity and the notion of AI as new ethical ground [55.2480439325792]
I argue that efforts to promote responsible and ethical AI can inadvertently contribute to and seemingly legitimize this disregard for established contextual norms. I question the current narrow prioritization in AI ethics of moral innovation over moral preservation.
arXiv Detail & Related papers (2024-12-06T15:36:13Z)
Using AI Alignment Theory to understand the potential pitfalls of regulatory frameworks [55.2480439325792]
This paper critically examines the European Union's Artificial Intelligence Act (EU AI Act) Uses insights from Alignment Theory (AT) research, which focuses on the potential pitfalls of technical alignment in Artificial Intelligence. As we apply these concepts to the EU AI Act, we uncover potential vulnerabilities and areas for improvement in the regulation.
arXiv Detail & Related papers (2024-10-10T17:38:38Z)
Beyond Preferences in AI Alignment [15.878773061188516]
We characterize and challenge the preferentist approach to AI alignment. We show how preferences fail to capture the thick semantic content of human values. We argue that AI systems should be aligned with normative standards appropriate to their social roles.
arXiv Detail & Related papers (2024-08-30T03:14:20Z)
The Switch, the Ladder, and the Matrix: Models for Classifying AI Systems [0.0]
There still exists a gap between principles and practices in AI ethics. One major obstacle organisations face when attempting to operationalise AI Ethics is the lack of a well-defined material scope.
arXiv Detail & Related papers (2024-07-07T12:16:01Z)
Towards Bidirectional Human-AI Alignment: A Systematic Review for Clarifications, Framework, and Future Directions [101.67121669727354]
Recent advancements in AI have highlighted the importance of guiding AI systems towards the intended goals, ethical principles, and values of individuals and groups, a concept broadly recognized as alignment. The lack of clarified definitions and scopes of human-AI alignment poses a significant obstacle, hampering collaborative efforts across research domains to achieve this alignment. We introduce a systematic review of over 400 papers published between 2019 and January 2024, spanning multiple domains such as Human-Computer Interaction (HCI), Natural Language Processing (NLP), Machine Learning (ML)
arXiv Detail & Related papers (2024-06-13T16:03:25Z)
Learning Machine Morality through Experience and Interaction [3.7414804164475983]
Increasing interest in ensuring safety of next-generation Artificial Intelligence (AI) systems calls for novel approaches to embedding morality into autonomous agents. We argue that more hybrid solutions are needed to create adaptable and robust, yet more controllable and interpretable agents.
arXiv Detail & Related papers (2023-12-04T11:46:34Z)
Foundational Moral Values for AI Alignment [0.0]
We present five core, foundational values, drawn from moral philosophy and built on the requisites for human existence: survival, sustainable intergenerational existence, society, education, and truth. We show that these values not only provide a clearer direction for technical alignment work, but also serve as a framework to highlight threats and opportunities from AI systems to both obtain and sustain these values.
arXiv Detail & Related papers (2023-11-28T18:11:24Z)
AI Alignment: A Comprehensive Survey [70.35693485015659]
AI alignment aims to make AI systems behave in line with human intentions and values. We identify four principles as the key objectives of AI alignment: Robustness, Interpretability, Controllability, and Ethicality. We decompose current alignment research into two key components: forward alignment and backward alignment.
arXiv Detail & Related papers (2023-10-30T15:52:15Z)
Fairness in Agreement With European Values: An Interdisciplinary Perspective on AI Regulation [61.77881142275982]
This interdisciplinary position paper considers various concerns surrounding fairness and discrimination in AI, and discusses how AI regulations address them. We first look at AI and fairness through the lenses of law, (AI) industry, sociotechnology, and (moral) philosophy, and present various perspectives. We identify and propose the roles AI Regulation should take to make the endeavor of the AI Act a success in terms of AI fairness concerns.
arXiv Detail & Related papers (2022-06-08T12:32:08Z)
Metaethical Perspectives on 'Benchmarking' AI Ethics [81.65697003067841]
Benchmarks are seen as the cornerstone for measuring technical progress in Artificial Intelligence (AI) research. An increasingly prominent research area in AI is ethics, which currently has no set of benchmarks nor commonly accepted way for measuring the 'ethicality' of an AI system. We argue that it makes more sense to talk about 'values' rather than 'ethics' when considering the possible actions of present and future AI systems.
arXiv Detail & Related papers (2022-04-11T14:36:39Z)
An interdisciplinary conceptual study of Artificial Intelligence (AI) for helping benefit-risk assessment practices: Towards a comprehensive qualification matrix of AI programs and devices (pre-print 2020) [55.41644538483948]
This paper proposes a comprehensive analysis of existing concepts coming from different disciplines tackling the notion of intelligence. The aim is to identify shared notions or discrepancies to consider for qualifying AI systems.
arXiv Detail & Related papers (2021-05-07T12:01:31Z)

This list is automatically generated from the titles and abstracts of the papers in this site.