Measuring Value Alignment
- URL: http://arxiv.org/abs/2312.15241v1
- Date: Sat, 23 Dec 2023 12:30:06 GMT
- Title: Measuring Value Alignment
- Authors: Fazl Barez and Philip Torr
- Abstract summary: This paper introduces a novel formalism to quantify the alignment between AI systems and human values.
By utilizing this formalism, AI developers and ethicists can better design and evaluate AI systems to ensure they operate in harmony with human values.
- Score: 12.696227679697493
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: As artificial intelligence (AI) systems become increasingly integrated into
various domains, ensuring that they align with human values becomes critical.
This paper introduces a novel formalism to quantify the alignment between AI
systems and human values, using Markov Decision Processes (MDPs) as the
foundational model. We delve into the concept of values as desirable goals tied
to actions and norms as behavioral guidelines, aiming to shed light on how they
can be used to guide AI decisions. This framework offers a mechanism to
evaluate the degree of alignment between norms and values by assessing
preference changes across state transitions in a normative world. By utilizing
this formalism, AI developers and ethicists can better design and evaluate AI
systems to ensure they operate in harmony with human values. The proposed
methodology holds potential for a wide range of applications, from
recommendation systems emphasizing well-being to autonomous vehicles
prioritizing safety.
Related papers
- Engineering Trustworthy AI: A Developer Guide for Empirical Risk Minimization [53.80919781981027]
Key requirements for trustworthy AI can be translated into design choices for the components of empirical risk minimization.
We hope to provide actionable guidance for building AI systems that meet emerging standards for trustworthiness of AI.
arXiv Detail & Related papers (2024-10-25T07:53:32Z) - Beyond Preferences in AI Alignment [15.878773061188516]
We characterize and challenge the preferentist approach to AI alignment.
We show how preferences fail to capture the thick semantic content of human values.
We argue that AI systems should be aligned with normative standards appropriate to their social roles.
arXiv Detail & Related papers (2024-08-30T03:14:20Z) - Ethical AI Governance: Methods for Evaluating Trustworthy AI [0.552480439325792]
Trustworthy Artificial Intelligence (TAI) integrates ethics that align with human values.
TAI evaluation aims to ensure ethical standards and safety in AI development and usage.
arXiv Detail & Related papers (2024-08-28T09:25:50Z) - Towards Bidirectional Human-AI Alignment: A Systematic Review for Clarifications, Framework, and Future Directions [101.67121669727354]
Recent advancements in AI have highlighted the importance of guiding AI systems towards the intended goals, ethical principles, and values of individuals and groups, a concept broadly recognized as alignment.
The lack of clarified definitions and scopes of human-AI alignment poses a significant obstacle, hampering collaborative efforts across research domains to achieve this alignment.
We introduce a systematic review of over 400 papers published between 2019 and January 2024, spanning multiple domains such as Human-Computer Interaction (HCI), Natural Language Processing (NLP), Machine Learning (ML)
arXiv Detail & Related papers (2024-06-13T16:03:25Z) - Towards Responsible AI in Banking: Addressing Bias for Fair
Decision-Making [69.44075077934914]
"Responsible AI" emphasizes the critical nature of addressing biases within the development of a corporate culture.
This thesis is structured around three fundamental pillars: understanding bias, mitigating bias, and accounting for bias.
In line with open-source principles, we have released Bias On Demand and FairView as accessible Python packages.
arXiv Detail & Related papers (2024-01-13T14:07:09Z) - Training Socially Aligned Language Models on Simulated Social
Interactions [99.39979111807388]
Social alignment in AI systems aims to ensure that these models behave according to established societal values.
Current language models (LMs) are trained to rigidly replicate their training corpus in isolation.
This work presents a novel training paradigm that permits LMs to learn from simulated social interactions.
arXiv Detail & Related papers (2023-05-26T14:17:36Z) - Human Values in Multiagent Systems [3.5027291542274357]
This paper presents a formal representation of values, grounded in the social sciences.
We use this formal representation to articulate the key challenges for achieving value-aligned behaviour in multiagent systems.
arXiv Detail & Related papers (2023-05-04T11:23:59Z) - Metaethical Perspectives on 'Benchmarking' AI Ethics [81.65697003067841]
Benchmarks are seen as the cornerstone for measuring technical progress in Artificial Intelligence (AI) research.
An increasingly prominent research area in AI is ethics, which currently has no set of benchmarks nor commonly accepted way for measuring the 'ethicality' of an AI system.
We argue that it makes more sense to talk about 'values' rather than 'ethics' when considering the possible actions of present and future AI systems.
arXiv Detail & Related papers (2022-04-11T14:36:39Z) - Value alignment: a formal approach [2.8348950186890467]
principles that should govern autonomous AI systems.
We first provide a formal model to represent values through preferences and ways to compute value aggregations.
Value alignment is then defined, and computed, for a given norm with respect to a given value through the increase/decrease that it results in the preferences of future states of the world.
arXiv Detail & Related papers (2021-10-18T12:40:04Z) - An interdisciplinary conceptual study of Artificial Intelligence (AI)
for helping benefit-risk assessment practices: Towards a comprehensive
qualification matrix of AI programs and devices (pre-print 2020) [55.41644538483948]
This paper proposes a comprehensive analysis of existing concepts coming from different disciplines tackling the notion of intelligence.
The aim is to identify shared notions or discrepancies to consider for qualifying AI systems.
arXiv Detail & Related papers (2021-05-07T12:01:31Z) - AAAI FSS-19: Human-Centered AI: Trustworthiness of AI Models and Data
Proceedings [8.445274192818825]
It is crucial for predictive models to be uncertainty-aware and yield trustworthy predictions.
The focus of this symposium was on AI systems to improve data quality and technical robustness and safety.
submissions from broadly defined areas also discussed approaches addressing requirements such as explainable models, human trust and ethical aspects of AI.
arXiv Detail & Related papers (2020-01-15T15:30:29Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.