Goal Alignment: A Human-Aware Account of Value Alignment Problem
- URL: http://arxiv.org/abs/2302.00813v1
- Date: Thu, 2 Feb 2023 01:18:57 GMT
- Title: Goal Alignment: A Human-Aware Account of Value Alignment Problem
- Authors: Malek Mechergui and Sarath Sreedharan
- Abstract summary: Value alignment problems arise in scenarios where the specified objectives of an AI agent don't match the true underlying objective of its users.
A foundational cause for misalignment is the inherent asymmetry in human expectations about the agent's behavior and the behavior generated by the agent for the specified objective.
We propose a novel formulation for the value alignment problem, named goal alignment that focuses on a few central challenges related to value alignment.
- Score: 16.660807368368758
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Value alignment problems arise in scenarios where the specified objectives of
an AI agent don't match the true underlying objective of its users. The problem
has been widely argued to be one of the central safety problems in AI.
Unfortunately, most existing works in value alignment tend to focus on issues
that are primarily related to the fact that reward functions are an unintuitive
mechanism to specify objectives. However, the complexity of the objective
specification mechanism is just one of many reasons why the user may have
misspecified their objective. A foundational cause for misalignment that is
being overlooked by these works is the inherent asymmetry in human expectations
about the agent's behavior and the behavior generated by the agent for the
specified objective. To address this lacuna, we propose a novel formulation for
the value alignment problem, named goal alignment that focuses on a few central
challenges related to value alignment. In doing so, we bridge the currently
disparate research areas of value alignment and human-aware planning.
Additionally, we propose a first-of-its-kind interactive algorithm that is
capable of using information generated under incorrect beliefs about the agent,
to determine the true underlying goal of the user.
Related papers
- Towards Bidirectional Human-AI Alignment: A Systematic Review for Clarifications, Framework, and Future Directions [101.67121669727354]
Recent advancements in AI have highlighted the importance of guiding AI systems towards the intended goals, ethical principles, and values of individuals and groups, a concept broadly recognized as alignment.
The lack of clarified definitions and scopes of human-AI alignment poses a significant obstacle, hampering collaborative efforts across research domains to achieve this alignment.
arXiv Detail & Related papers (2024-06-13T16:03:25Z) - Quantifying Misalignment Between Agents [2.619545850602691]
Growing concerns about the AI alignment problem have emerged in recent years.
We show how misalignment can vary depending on the population of agents being observed.
Our model departs from value specification approaches and focuses instead on the morass of complex, interlocking, sometimes contradictory goals that agents may have in practice.
arXiv Detail & Related papers (2024-06-06T16:31:22Z) - Handling Reward Misspecification in the Presence of Expectation Mismatch [19.03141646688652]
We use the theory of mind, i.e., the human user's beliefs about the AI agent, as a basis to develop a formal explanatory framework.
We propose a new interactive algorithm that uses the specified reward to infer potential user expectations.
arXiv Detail & Related papers (2024-04-12T19:43:37Z) - Controllable Preference Optimization: Toward Controllable
Multi-Objective Alignment [107.63756895544842]
Alignment in artificial intelligence pursues consistency between model responses and human preferences as well as values.
Existing alignment techniques are mostly unidirectional, leading to suboptimal trade-offs and poor flexibility over various objectives.
We introduce controllable preference optimization (CPO), which explicitly specifies preference scores for different objectives.
arXiv Detail & Related papers (2024-02-29T12:12:30Z) - Concept Alignment as a Prerequisite for Value Alignment [11.236150405125754]
Value alignment is essential for building AI systems that can safely and reliably interact with people.
We show how concept alignment can lead to systematic value mis-alignment.
We describe an approach that helps minimize such failure modes by jointly reasoning about a person's concepts and values.
arXiv Detail & Related papers (2023-10-30T22:23:15Z) - Formalizing the Problem of Side Effect Regularization [81.97441214404247]
We propose a formal criterion for side effect regularization via the assistance game framework.
In these games, the agent solves a partially observable Markov decision process.
We show that this POMDP is solved by trading off the proxy reward with the agent's ability to achieve a range of future tasks.
arXiv Detail & Related papers (2022-06-23T16:36:13Z) - Aligned with Whom? Direct and social goals for AI systems [0.0]
This article distinguishes two types of alignment problems depending on whose goals we consider.
The direct alignment problem considers whether an AI system accomplishes the goals of the entity operating it.
The social alignment problem considers the effects of an AI system on larger groups or on society more broadly.
arXiv Detail & Related papers (2022-05-09T13:49:47Z) - Generative multitask learning mitigates target-causing confounding [61.21582323566118]
We propose a simple and scalable approach to causal representation learning for multitask learning.
The improvement comes from mitigating unobserved confounders that cause the targets, but not the input.
Our results on the Attributes of People and Taskonomy datasets reflect the conceptual improvement in robustness to prior probability shift.
arXiv Detail & Related papers (2022-02-08T20:42:14Z) - Offline Contextual Bandits with Overparameterized Models [52.788628474552276]
We ask whether the same phenomenon occurs for offline contextual bandits.
We show that this discrepancy is due to the emphaction-stability of their objectives.
In experiments with large neural networks, this gap between action-stable value-based objectives and unstable policy-based objectives leads to significant performance differences.
arXiv Detail & Related papers (2020-06-27T13:52:07Z) - ObjectNav Revisited: On Evaluation of Embodied Agents Navigating to
Objects [119.46959413000594]
This document summarizes the consensus recommendations of a working group on ObjectNav.
We make recommendations on subtle but important details of evaluation criteria.
We provide a detailed description of the instantiation of these recommendations in challenges organized at the Embodied AI workshop at CVPR 2020.
arXiv Detail & Related papers (2020-06-23T17:18:54Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.