Related papers: Avoiding Side Effects By Considering Future Tasks

Avoiding Side Effects By Considering Future Tasks

URL: http://arxiv.org/abs/2010.07877v1
Date: Thu, 15 Oct 2020 16:55:26 GMT
Title: Avoiding Side Effects By Considering Future Tasks
Authors: Victoria Krakovna, Laurent Orseau, Richard Ngo, Miljan Martic, Shane Legg
Abstract summary: We propose an algorithm to automatically generate an auxiliary reward function that penalizes side effects. This auxiliary objective rewards the ability to complete possible future tasks, which decreases if the agent causes side effects during the current task. We show that our method avoids interference and is more effective for avoiding side effects than the common approach of penalizing irreversible actions.
Score: 21.443513600055837
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Designing reward functions is difficult: the designer has to specify what to do (what it means to complete the task) as well as what not to do (side effects that should be avoided while completing the task). To alleviate the burden on the reward designer, we propose an algorithm to automatically generate an auxiliary reward function that penalizes side effects. This auxiliary objective rewards the ability to complete possible future tasks, which decreases if the agent causes side effects during the current task. The future task reward can also give the agent an incentive to interfere with events in the environment that make future tasks less achievable, such as irreversible actions by other agents. To avoid this interference incentive, we introduce a baseline policy that represents a default course of action (such as doing nothing), and use it to filter out future tasks that are not achievable by default. We formally define interference incentives and show that the future task approach with a baseline policy avoids these incentives in the deterministic case. Using gridworld environments that test for side effects and interference, we show that our method avoids interference and is more effective for avoiding side effects than the common approach of penalizing irreversible actions.

Related papers

Deceptive Sequential Decision-Making via Regularized Policy Optimization [54.38738815697299]
Two regularization strategies for policy synthesis problems that actively deceive an adversary about a system's underlying rewards are presented. We show how each form of deception can be implemented in policy optimization problems. We show that diversionary deception can cause the adversary to believe that the most important agent is the least important, while attaining a total accumulated reward that is $98.83%$ of its optimal, non-deceptive value.
arXiv Detail & Related papers (2025-01-30T23:41:40Z)
Stealthy Multi-Task Adversarial Attacks [17.24457318044218]
We investigate selectively targeting one task while preserving performance in others within a multi-task framework. This approach is motivated by varying security priorities among tasks in real-world applications, such as autonomous driving. We propose a method for the stealthy multi-task attack framework that utilizes multiple algorithms to inject imperceptible noise into the input.
arXiv Detail & Related papers (2024-11-26T23:18:32Z)
Behavior Alignment via Reward Function Optimization [23.92721220310242]
We introduce a new framework that integrates auxiliary rewards reflecting a designer's domain knowledge with the environment's primary rewards. We evaluate our method's efficacy on a diverse set of tasks, from small-scale experiments to high-dimensional control challenges.
arXiv Detail & Related papers (2023-10-29T13:45:07Z)
Formalizing the Problem of Side Effect Regularization [81.97441214404247]
We propose a formal criterion for side effect regularization via the assistance game framework. In these games, the agent solves a partially observable Markov decision process. We show that this POMDP is solved by trading off the proxy reward with the agent's ability to achieve a range of future tasks.
arXiv Detail & Related papers (2022-06-23T16:36:13Z)
The Effects of Reward Misspecification: Mapping and Mitigating Misaligned Models [85.68751244243823]
Reward hacking -- where RL agents exploit gaps in misspecified reward functions -- has been widely observed, but not yet systematically studied. We investigate reward hacking as a function of agent capabilities: model capacity, action space resolution, observation space noise, and training time. We find instances of phase transitions: capability thresholds at which the agent's behavior qualitatively shifts, leading to a sharp decrease in the true reward.
arXiv Detail & Related papers (2022-01-10T18:58:52Z)
Admissible Policy Teaching through Reward Design [32.39785256112934]
We study reward design strategies for incentivizing a reinforcement learning agent to adopt a policy from a set of admissible policies. The goal of the reward designer is to modify the underlying reward function cost-efficiently while ensuring that any approximately optimal deterministic policy under the new reward function is admissible.
arXiv Detail & Related papers (2022-01-06T18:49:57Z)
Learning to Be Cautious [71.9871661858886]
A key challenge in the field of reinforcement learning is to develop agents that behave cautiously in novel situations. We present a sequence of tasks where cautious behavior becomes increasingly non-obvious, as well as an algorithm to demonstrate that it is possible for a system to emphlearn to be cautious.
arXiv Detail & Related papers (2021-10-29T16:52:45Z)
Outcome-Driven Reinforcement Learning via Variational Inference [95.82770132618862]
We discuss a new perspective on reinforcement learning, recasting it as the problem of inferring actions that achieve desired outcomes, rather than a problem of maximizing rewards. To solve the resulting outcome-directed inference problem, we establish a novel variational inference formulation that allows us to derive a well-shaped reward function. We empirically demonstrate that this method eliminates the need to design reward functions and leads to effective goal-directed behaviors.
arXiv Detail & Related papers (2021-04-20T18:16:21Z)
Challenges for Using Impact Regularizers to Avoid Negative Side Effects [74.67972013102462]
We discuss the main current challenges of impact regularizers and relate them to fundamental design decisions. We explore promising directions to overcome the unsolved challenges in preventing negative side effects with impact regularizers.
arXiv Detail & Related papers (2021-01-29T10:32:51Z)
Counterfactual Credit Assignment in Model-Free Reinforcement Learning [47.79277857377155]
Credit assignment in reinforcement learning is the problem of measuring an action's influence on future rewards. We adapt the notion of counterfactuals from causality theory to a model-free RL setup. We formulate a family of policy algorithms that use future-conditional value functions as baselines or critics, and show that they are provably low variance.
arXiv Detail & Related papers (2020-11-18T18:41:44Z)
Avoiding Side Effects in Complex Environments [87.25064477073205]
In toy environments, Attainable Utility Preservation avoided side effects by penalizing shifts in the ability to achieve randomly generated goals. We scale this approach to large, randomly generated environments based on Conway's Game of Life.
arXiv Detail & Related papers (2020-06-11T16:02:30Z)

This list is automatically generated from the titles and abstracts of the papers in this site.