Avoiding Side Effects By Considering Future Tasks
- URL: http://arxiv.org/abs/2010.07877v1
- Date: Thu, 15 Oct 2020 16:55:26 GMT
- Title: Avoiding Side Effects By Considering Future Tasks
- Authors: Victoria Krakovna, Laurent Orseau, Richard Ngo, Miljan Martic, Shane
Legg
- Abstract summary: We propose an algorithm to automatically generate an auxiliary reward function that penalizes side effects.
This auxiliary objective rewards the ability to complete possible future tasks, which decreases if the agent causes side effects during the current task.
We show that our method avoids interference and is more effective for avoiding side effects than the common approach of penalizing irreversible actions.
- Score: 21.443513600055837
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Designing reward functions is difficult: the designer has to specify what to
do (what it means to complete the task) as well as what not to do (side effects
that should be avoided while completing the task). To alleviate the burden on
the reward designer, we propose an algorithm to automatically generate an
auxiliary reward function that penalizes side effects. This auxiliary objective
rewards the ability to complete possible future tasks, which decreases if the
agent causes side effects during the current task. The future task reward can
also give the agent an incentive to interfere with events in the environment
that make future tasks less achievable, such as irreversible actions by other
agents. To avoid this interference incentive, we introduce a baseline policy
that represents a default course of action (such as doing nothing), and use it
to filter out future tasks that are not achievable by default. We formally
define interference incentives and show that the future task approach with a
baseline policy avoids these incentives in the deterministic case. Using
gridworld environments that test for side effects and interference, we show
that our method avoids interference and is more effective for avoiding side
effects than the common approach of penalizing irreversible actions.
Related papers
- Behavior Alignment via Reward Function Optimization [23.92721220310242]
We introduce a new framework that integrates auxiliary rewards reflecting a designer's domain knowledge with the environment's primary rewards.
We evaluate our method's efficacy on a diverse set of tasks, from small-scale experiments to high-dimensional control challenges.
arXiv Detail & Related papers (2023-10-29T13:45:07Z) - Formalizing the Problem of Side Effect Regularization [81.97441214404247]
We propose a formal criterion for side effect regularization via the assistance game framework.
In these games, the agent solves a partially observable Markov decision process.
We show that this POMDP is solved by trading off the proxy reward with the agent's ability to achieve a range of future tasks.
arXiv Detail & Related papers (2022-06-23T16:36:13Z) - The Effects of Reward Misspecification: Mapping and Mitigating
Misaligned Models [85.68751244243823]
Reward hacking -- where RL agents exploit gaps in misspecified reward functions -- has been widely observed, but not yet systematically studied.
We investigate reward hacking as a function of agent capabilities: model capacity, action space resolution, observation space noise, and training time.
We find instances of phase transitions: capability thresholds at which the agent's behavior qualitatively shifts, leading to a sharp decrease in the true reward.
arXiv Detail & Related papers (2022-01-10T18:58:52Z) - Admissible Policy Teaching through Reward Design [32.39785256112934]
We study reward design strategies for incentivizing a reinforcement learning agent to adopt a policy from a set of admissible policies.
The goal of the reward designer is to modify the underlying reward function cost-efficiently while ensuring that any approximately optimal deterministic policy under the new reward function is admissible.
arXiv Detail & Related papers (2022-01-06T18:49:57Z) - Learning to Be Cautious [71.9871661858886]
A key challenge in the field of reinforcement learning is to develop agents that behave cautiously in novel situations.
We present a sequence of tasks where cautious behavior becomes increasingly non-obvious, as well as an algorithm to demonstrate that it is possible for a system to emphlearn to be cautious.
arXiv Detail & Related papers (2021-10-29T16:52:45Z) - Outcome-Driven Reinforcement Learning via Variational Inference [95.82770132618862]
We discuss a new perspective on reinforcement learning, recasting it as the problem of inferring actions that achieve desired outcomes, rather than a problem of maximizing rewards.
To solve the resulting outcome-directed inference problem, we establish a novel variational inference formulation that allows us to derive a well-shaped reward function.
We empirically demonstrate that this method eliminates the need to design reward functions and leads to effective goal-directed behaviors.
arXiv Detail & Related papers (2021-04-20T18:16:21Z) - Challenges for Using Impact Regularizers to Avoid Negative Side Effects [74.67972013102462]
We discuss the main current challenges of impact regularizers and relate them to fundamental design decisions.
We explore promising directions to overcome the unsolved challenges in preventing negative side effects with impact regularizers.
arXiv Detail & Related papers (2021-01-29T10:32:51Z) - Counterfactual Credit Assignment in Model-Free Reinforcement Learning [47.79277857377155]
Credit assignment in reinforcement learning is the problem of measuring an action's influence on future rewards.
We adapt the notion of counterfactuals from causality theory to a model-free RL setup.
We formulate a family of policy algorithms that use future-conditional value functions as baselines or critics, and show that they are provably low variance.
arXiv Detail & Related papers (2020-11-18T18:41:44Z) - Avoiding Side Effects in Complex Environments [87.25064477073205]
In toy environments, Attainable Utility Preservation avoided side effects by penalizing shifts in the ability to achieve randomly generated goals.
We scale this approach to large, randomly generated environments based on Conway's Game of Life.
arXiv Detail & Related papers (2020-06-11T16:02:30Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.