Measuring Value Alignment
- URL: http://arxiv.org/abs/2312.15241v1
- Date: Sat, 23 Dec 2023 12:30:06 GMT
- Title: Measuring Value Alignment
- Authors: Fazl Barez and Philip Torr
- Abstract summary: This paper introduces a novel formalism to quantify the alignment between AI systems and human values.
By utilizing this formalism, AI developers and ethicists can better design and evaluate AI systems to ensure they operate in harmony with human values.
- Score: 12.696227679697493
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: As artificial intelligence (AI) systems become increasingly integrated into
various domains, ensuring that they align with human values becomes critical.
This paper introduces a novel formalism to quantify the alignment between AI
systems and human values, using Markov Decision Processes (MDPs) as the
foundational model. We delve into the concept of values as desirable goals tied
to actions and norms as behavioral guidelines, aiming to shed light on how they
can be used to guide AI decisions. This framework offers a mechanism to
evaluate the degree of alignment between norms and values by assessing
preference changes across state transitions in a normative world. By utilizing
this formalism, AI developers and ethicists can better design and evaluate AI
systems to ensure they operate in harmony with human values. The proposed
methodology holds potential for a wide range of applications, from
recommendation systems emphasizing well-being to autonomous vehicles
prioritizing safety.
Related papers
- Dynamic Normativity: Necessary and Sufficient Conditions for Value Alignment [0.0]
We find "alignment" a problem related to the challenges of expressing human goals and values in a manner that artificial systems can follow without leading to unwanted adversarial effects.
This work addresses alignment as a technical-philosophical problem that requires solid philosophical foundations and practical implementations that bring normative theory to AI system development.
arXiv Detail & Related papers (2024-06-16T18:37:31Z) - Towards Bidirectional Human-AI Alignment: A Systematic Review for Clarifications, Framework, and Future Directions [101.67121669727354]
Recent advancements in AI have highlighted the importance of guiding AI systems towards the intended goals, ethical principles, and values of individuals and groups, a concept broadly recognized as alignment.
The lack of clarified definitions and scopes of human-AI alignment poses a significant obstacle, hampering collaborative efforts across research domains to achieve this alignment.
arXiv Detail & Related papers (2024-06-13T16:03:25Z) - Towards Responsible AI in Banking: Addressing Bias for Fair
Decision-Making [69.44075077934914]
"Responsible AI" emphasizes the critical nature of addressing biases within the development of a corporate culture.
This thesis is structured around three fundamental pillars: understanding bias, mitigating bias, and accounting for bias.
In line with open-source principles, we have released Bias On Demand and FairView as accessible Python packages.
arXiv Detail & Related papers (2024-01-13T14:07:09Z) - Levels of AGI for Operationalizing Progress on the Path to AGI [64.59151650272477]
We propose a framework for classifying the capabilities and behavior of Artificial General Intelligence (AGI) models and their precursors.
This framework introduces levels of AGI performance, generality, and autonomy, providing a common language to compare models, assess risks, and measure progress along the path to AGI.
arXiv Detail & Related papers (2023-11-04T17:44:58Z) - Training Socially Aligned Language Models on Simulated Social
Interactions [99.39979111807388]
Social alignment in AI systems aims to ensure that these models behave according to established societal values.
Current language models (LMs) are trained to rigidly replicate their training corpus in isolation.
This work presents a novel training paradigm that permits LMs to learn from simulated social interactions.
arXiv Detail & Related papers (2023-05-26T14:17:36Z) - Human Values in Multiagent Systems [3.5027291542274357]
This paper presents a formal representation of values, grounded in the social sciences.
We use this formal representation to articulate the key challenges for achieving value-aligned behaviour in multiagent systems.
arXiv Detail & Related papers (2023-05-04T11:23:59Z) - A Human-Centric Assessment Framework for AI [11.065260433086024]
There is no agreed standard on how explainable AI systems should be assessed.
Inspired by the Turing test, we introduce a human-centric assessment framework.
This setup can serve as framework for a wide range of human-centric AI system assessments.
arXiv Detail & Related papers (2022-05-25T12:59:13Z) - Metaethical Perspectives on 'Benchmarking' AI Ethics [81.65697003067841]
Benchmarks are seen as the cornerstone for measuring technical progress in Artificial Intelligence (AI) research.
An increasingly prominent research area in AI is ethics, which currently has no set of benchmarks nor commonly accepted way for measuring the 'ethicality' of an AI system.
We argue that it makes more sense to talk about 'values' rather than 'ethics' when considering the possible actions of present and future AI systems.
arXiv Detail & Related papers (2022-04-11T14:36:39Z) - Value alignment: a formal approach [2.8348950186890467]
principles that should govern autonomous AI systems.
We first provide a formal model to represent values through preferences and ways to compute value aggregations.
Value alignment is then defined, and computed, for a given norm with respect to a given value through the increase/decrease that it results in the preferences of future states of the world.
arXiv Detail & Related papers (2021-10-18T12:40:04Z) - An interdisciplinary conceptual study of Artificial Intelligence (AI)
for helping benefit-risk assessment practices: Towards a comprehensive
qualification matrix of AI programs and devices (pre-print 2020) [55.41644538483948]
This paper proposes a comprehensive analysis of existing concepts coming from different disciplines tackling the notion of intelligence.
The aim is to identify shared notions or discrepancies to consider for qualifying AI systems.
arXiv Detail & Related papers (2021-05-07T12:01:31Z) - AAAI FSS-19: Human-Centered AI: Trustworthiness of AI Models and Data
Proceedings [8.445274192818825]
It is crucial for predictive models to be uncertainty-aware and yield trustworthy predictions.
The focus of this symposium was on AI systems to improve data quality and technical robustness and safety.
submissions from broadly defined areas also discussed approaches addressing requirements such as explainable models, human trust and ethical aspects of AI.
arXiv Detail & Related papers (2020-01-15T15:30:29Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.