Avoiding Side Effects in Complex Environments
- URL: http://arxiv.org/abs/2006.06547v2
- Date: Thu, 22 Oct 2020 15:15:46 GMT
- Title: Avoiding Side Effects in Complex Environments
- Authors: Alexander Matt Turner, Neale Ratzlaff, Prasad Tadepalli
- Abstract summary: In toy environments, Attainable Utility Preservation avoided side effects by penalizing shifts in the ability to achieve randomly generated goals.
We scale this approach to large, randomly generated environments based on Conway's Game of Life.
- Score: 87.25064477073205
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Reward function specification can be difficult. Rewarding the agent for
making a widget may be easy, but penalizing the multitude of possible negative
side effects is hard. In toy environments, Attainable Utility Preservation
(AUP) avoided side effects by penalizing shifts in the ability to achieve
randomly generated goals. We scale this approach to large, randomly generated
environments based on Conway's Game of Life. By preserving optimal value for a
single randomly generated reward function, AUP incurs modest overhead while
leading the agent to complete the specified task and avoid many side effects.
Videos and code are available at https://avoiding-side-effects.github.io/.
Related papers
- Risk-averse Batch Active Inverse Reward Design [0.0]
Active Inverse Reward Design (AIRD) proposed the use of a series of queries, comparing possible reward functions in a single training environment.
It ignores the possibility of unknown features appearing in real-world environments, and the safety measures needed until the agent completely learns the reward function.
I improved this method and created Risk-averse Batch Active Inverse Reward Design (RBAIRD), which constructs batches, sets of environments the agent encounters when being used in the real world, processes them sequentially, and, for a predetermined number of iterations, asks queries that the human needs to answer for each environment of the batch.
RB
arXiv Detail & Related papers (2023-11-20T18:36:10Z) - Latent Exploration for Reinforcement Learning [87.42776741119653]
In Reinforcement Learning, agents learn policies by exploring and interacting with the environment.
We propose LATent TIme-Correlated Exploration (Lattice), a method to inject temporally-correlated noise into the latent state of the policy network.
arXiv Detail & Related papers (2023-05-31T17:40:43Z) - Learning with Noisy Labels via Sparse Regularization [76.31104997491695]
Learning with noisy labels is an important task for training accurate deep neural networks.
Some commonly-used loss functions, such as Cross Entropy (CE), suffer from severe overfitting to noisy labels.
We introduce the sparse regularization strategy to approximate the one-hot constraint.
arXiv Detail & Related papers (2021-07-31T09:40:23Z) - Discriminator-Free Generative Adversarial Attack [87.71852388383242]
Agenerative-based adversarial attacks can get rid of this limitation.
ASymmetric Saliency-based Auto-Encoder (SSAE) generates the perturbations.
The adversarial examples generated by SSAE not only make thewidely-used models collapse, but also achieves good visual quality.
arXiv Detail & Related papers (2021-07-20T01:55:21Z) - Transferable Sparse Adversarial Attack [62.134905824604104]
We introduce a generator architecture to alleviate the overfitting issue and thus efficiently craft transferable sparse adversarial examples.
Our method achieves superior inference speed, 700$times$ faster than other optimization-based methods.
arXiv Detail & Related papers (2021-05-31T06:44:58Z) - Patch-wise++ Perturbation for Adversarial Targeted Attacks [132.58673733817838]
We propose a patch-wise iterative method (PIM) aimed at crafting adversarial examples with high transferability.
Specifically, we introduce an amplification factor to the step size in each iteration, and one pixel's overall gradient overflowing the $epsilon$-constraint is properly assigned to its surrounding regions.
Compared with the current state-of-the-art attack methods, we significantly improve the success rate by 35.9% for defense models and 32.7% for normally trained models.
arXiv Detail & Related papers (2020-12-31T08:40:42Z) - Avoiding Side Effects By Considering Future Tasks [21.443513600055837]
We propose an algorithm to automatically generate an auxiliary reward function that penalizes side effects.
This auxiliary objective rewards the ability to complete possible future tasks, which decreases if the agent causes side effects during the current task.
We show that our method avoids interference and is more effective for avoiding side effects than the common approach of penalizing irreversible actions.
arXiv Detail & Related papers (2020-10-15T16:55:26Z) - Addressing reward bias in Adversarial Imitation Learning with neutral
reward functions [1.7188280334580197]
Imitation Learning suffers from the fundamental problem of reward bias stemming from the choice of reward functions used in the algorithm.
We provide a theoretical sketch of why existing reward functions would fail in imitation learning scenarios in task based environments with multiple terminal states.
We propose a new reward function for GAIL which outperforms existing GAIL methods on task based environments with single and multiple terminal states.
arXiv Detail & Related papers (2020-09-20T16:24:10Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.