Formalizing the Problem of Side Effect Regularization
- URL: http://arxiv.org/abs/2206.11812v2
- Date: Fri, 24 Jun 2022 16:13:47 GMT
- Title: Formalizing the Problem of Side Effect Regularization
- Authors: Alexander Matt Turner, Aseem Saxena, Prasad Tadepalli
- Abstract summary: We propose a formal criterion for side effect regularization via the assistance game framework.
In these games, the agent solves a partially observable Markov decision process.
We show that this POMDP is solved by trading off the proxy reward with the agent's ability to achieve a range of future tasks.
- Score: 81.97441214404247
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: AI objectives are often hard to specify properly. Some approaches tackle this
problem by regularizing the AI's side effects: Agents must weigh off "how much
of a mess they make" with an imperfectly specified proxy objective. We propose
a formal criterion for side effect regularization via the assistance game
framework. In these games, the agent solves a partially observable Markov
decision process (POMDP) representing its uncertainty about the objective
function it should optimize. We consider the setting where the true objective
is revealed to the agent at a later time step. We show that this POMDP is
solved by trading off the proxy reward with the agent's ability to achieve a
range of future tasks. We empirically demonstrate the reasonableness of our
problem formalization via ground-truth evaluation in two gridworld
environments.
Related papers
- Criticality and Safety Margins for Reinforcement Learning [53.10194953873209]
We seek to define a criticality framework with both a quantifiable ground truth and a clear significance to users.
We introduce true criticality as the expected drop in reward when an agent deviates from its policy for n consecutive random actions.
We also introduce the concept of proxy criticality, a low-overhead metric that has a statistically monotonic relationship to true criticality.
arXiv Detail & Related papers (2024-09-26T21:00:45Z) - Online Decision Mediation [72.80902932543474]
Consider learning a decision support assistant to serve as an intermediary between (oracle) expert behavior and (imperfect) human behavior.
In clinical diagnosis, fully-autonomous machine behavior is often beyond ethical affordances.
arXiv Detail & Related papers (2023-10-28T05:59:43Z) - On Imperfect Recall in Multi-Agent Influence Diagrams [57.21088266396761]
Multi-agent influence diagrams (MAIDs) are a popular game-theoretic model based on Bayesian networks.
We show how to solve MAIDs with forgetful and absent-minded agents using mixed policies and two types of correlated equilibrium.
We also describe applications of MAIDs to Markov games and team situations, where imperfect recall is often unavoidable.
arXiv Detail & Related papers (2023-07-11T07:08:34Z) - Goal Alignment: A Human-Aware Account of Value Alignment Problem [16.660807368368758]
Value alignment problems arise in scenarios where the specified objectives of an AI agent don't match the true underlying objective of its users.
A foundational cause for misalignment is the inherent asymmetry in human expectations about the agent's behavior and the behavior generated by the agent for the specified objective.
We propose a novel formulation for the value alignment problem, named goal alignment that focuses on a few central challenges related to value alignment.
arXiv Detail & Related papers (2023-02-02T01:18:57Z) - On Avoiding Power-Seeking by Artificial Intelligence [93.9264437334683]
We do not know how to align a very intelligent AI agent's behavior with human interests.
I investigate whether we can build smart AI agents which have limited impact on the world, and which do not autonomously seek power.
arXiv Detail & Related papers (2022-06-23T16:56:21Z) - End-to-End Learning and Intervention in Games [60.41921763076017]
We provide a unified framework for learning and intervention in games.
We propose two approaches, respectively based on explicit and implicit differentiation.
The analytical results are validated using several real-world problems.
arXiv Detail & Related papers (2020-10-26T18:39:32Z) - Tradeoff-Focused Contrastive Explanation for MDP Planning [7.929642367937801]
In real-world applications of planning, planning agents' decisions can involve complex tradeoffs among competing objectives.
It can be difficult for the end-users to understand why an agent decides on a particular planning solution on the basis of its objective values.
We propose an approach, based on contrastive explanation, that enables a multi-objective MDP planning agent to explain its decisions in a way that communicates its tradeoff rationale.
arXiv Detail & Related papers (2020-04-27T17:17:58Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.