Value alignment: a formal approach
- URL: http://arxiv.org/abs/2110.09240v1
- Date: Mon, 18 Oct 2021 12:40:04 GMT
- Title: Value alignment: a formal approach
- Authors: Carles Sierra and Nardine Osman and Pablo Noriega and Jordi
Sabater-Mir and Antoni Perell\'o
- Abstract summary: principles that should govern autonomous AI systems.
We first provide a formal model to represent values through preferences and ways to compute value aggregations.
Value alignment is then defined, and computed, for a given norm with respect to a given value through the increase/decrease that it results in the preferences of future states of the world.
- Score: 2.8348950186890467
- License: http://creativecommons.org/licenses/by-nc-nd/4.0/
- Abstract: principles that should govern autonomous AI systems. It essentially states
that a system's goals and behaviour should be aligned with human values. But
how to ensure value alignment? In this paper we first provide a formal model to
represent values through preferences and ways to compute value aggregations;
i.e. preferences with respect to a group of agents and/or preferences with
respect to sets of values. Value alignment is then defined, and computed, for a
given norm with respect to a given value through the increase/decrease that it
results in the preferences of future states of the world. We focus on norms as
it is norms that govern behaviour, and as such, the alignment of a given system
with a given value will be dictated by the norms the system follows.
Related papers
- ValueCompass: A Framework of Fundamental Values for Human-AI Alignment [15.35489011078817]
We introduce Value, a framework of fundamental values, grounded in psychological theory and a systematic review.
We apply Value to measure the value alignment of humans and language models (LMs) across four real-world vignettes.
Our findings uncover risky misalignment between humans and LMs, such as LMs agreeing with values like "Choose Own Goals", which are largely disagreed by humans.
arXiv Detail & Related papers (2024-09-15T02:13:03Z) - Beyond Preferences in AI Alignment [15.878773061188516]
We characterize and challenge the preferentist approach to AI alignment.
We show how preferences fail to capture the thick semantic content of human values.
We argue that AI systems should be aligned with normative standards appropriate to their social roles.
arXiv Detail & Related papers (2024-08-30T03:14:20Z) - Align on the Fly: Adapting Chatbot Behavior to Established Norms [47.34022081652952]
We propose an On-the-fly Preference Optimization (OPO) method, which is a real-time alignment that works in a streaming way.
Experimental results on both human-annotated and auto-generated questions from legal and moral domains indicate the effectiveness of the proposed OPO method.
arXiv Detail & Related papers (2023-12-26T06:51:09Z) - Measuring Value Alignment [12.696227679697493]
This paper introduces a novel formalism to quantify the alignment between AI systems and human values.
By utilizing this formalism, AI developers and ethicists can better design and evaluate AI systems to ensure they operate in harmony with human values.
arXiv Detail & Related papers (2023-12-23T12:30:06Z) - Concept Alignment as a Prerequisite for Value Alignment [11.236150405125754]
Value alignment is essential for building AI systems that can safely and reliably interact with people.
We show how concept alignment can lead to systematic value mis-alignment.
We describe an approach that helps minimize such failure modes by jointly reasoning about a person's concepts and values.
arXiv Detail & Related papers (2023-10-30T22:23:15Z) - Evaluating the Fairness of Discriminative Foundation Models in Computer
Vision [51.176061115977774]
We propose a novel taxonomy for bias evaluation of discriminative foundation models, such as Contrastive Language-Pretraining (CLIP)
We then systematically evaluate existing methods for mitigating bias in these models with respect to our taxonomy.
Specifically, we evaluate OpenAI's CLIP and OpenCLIP models for key applications, such as zero-shot classification, image retrieval and image captioning.
arXiv Detail & Related papers (2023-10-18T10:32:39Z) - Heterogeneous Value Alignment Evaluation for Large Language Models [91.96728871418]
Large Language Models (LLMs) have made it crucial to align their values with those of humans.
We propose a Heterogeneous Value Alignment Evaluation (HVAE) system to assess the success of aligning LLMs with heterogeneous values.
arXiv Detail & Related papers (2023-05-26T02:34:20Z) - Value Engineering for Autonomous Agents [3.6130723421895947]
Previous approaches have treated values as labels associated with some actions or states of the world, rather than as integral components of agent reasoning.
We propose a new AMA paradigm grounded in moral and social psychology, where values are instilled into agents as context-dependent goals.
We argue that this type of normative reasoning, where agents are endowed with an understanding of norms' moral implications, leads to value-awareness in autonomous agents.
arXiv Detail & Related papers (2023-02-17T08:52:15Z) - Towards a multi-stakeholder value-based assessment framework for
algorithmic systems [76.79703106646967]
We develop a value-based assessment framework that visualizes closeness and tensions between values.
We give guidelines on how to operationalize them, while opening up the evaluation and deliberation process to a wide range of stakeholders.
arXiv Detail & Related papers (2022-05-09T19:28:32Z) - Re-Examining System-Level Correlations of Automatic Summarization
Evaluation Metrics [64.81682222169113]
How reliably an automatic summarization evaluation metric replicates human judgments of summary quality is quantified by system-level correlations.
We identify two ways in which the definition of the system-level correlation is inconsistent with how metrics are used to evaluate systems in practice.
arXiv Detail & Related papers (2022-04-21T15:52:14Z) - Robust and Adaptive Temporal-Difference Learning Using An Ensemble of
Gaussian Processes [70.80716221080118]
The paper takes a generative perspective on policy evaluation via temporal-difference (TD) learning.
The OS-GPTD approach is developed to estimate the value function for a given policy by observing a sequence of state-reward pairs.
To alleviate the limited expressiveness associated with a single fixed kernel, a weighted ensemble (E) of GP priors is employed to yield an alternative scheme.
arXiv Detail & Related papers (2021-12-01T23:15:09Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.