Shielding Atari Games with Bounded Prescience
- URL: http://arxiv.org/abs/2101.08153v2
- Date: Fri, 22 Jan 2021 14:08:01 GMT
- Title: Shielding Atari Games with Bounded Prescience
- Authors: Mirco Giacobbe, Mohammadhosein Hasanbeig, Daniel Kroening, Hjalmar
Wijk
- Abstract summary: We present the first exact method for analysing and ensuring the safety of DRL agents for Atari games.
First, we give a set of 43 properties that characterise "safe behaviour" for 30 games.
Second, we develop a method for exploring all traces induced by an agent and a game.
Third, we propose a countermeasure that combines a bounded explicit-state exploration with shielding.
- Score: 8.874011540975715
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Deep reinforcement learning (DRL) is applied in safety-critical domains such
as robotics and autonomous driving. It achieves superhuman abilities in many
tasks, however whether DRL agents can be shown to act safely is an open
problem. Atari games are a simple yet challenging exemplar for evaluating the
safety of DRL agents and feature a diverse portfolio of game mechanics. The
safety of neural agents has been studied before using methods that either
require a model of the system dynamics or an abstraction; unfortunately, these
are unsuitable to Atari games because their low-level dynamics are complex and
hidden inside their emulator. We present the first exact method for analysing
and ensuring the safety of DRL agents for Atari games. Our method only requires
access to the emulator. First, we give a set of 43 properties that characterise
"safe behaviour" for 30 games. Second, we develop a method for exploring all
traces induced by an agent and a game and consider a variety of sources of game
non-determinism. We observe that the best available DRL agents reliably satisfy
only very few properties; several critical properties are violated by all
agents. Finally, we propose a countermeasure that combines a bounded
explicit-state exploration with shielding. We demonstrate that our method
improves the safety of all agents over multiple properties.
Related papers
- Leveraging Reward Consistency for Interpretable Feature Discovery in
Reinforcement Learning [69.19840497497503]
It is argued that the commonly used action matching principle is more like an explanation of deep neural networks (DNNs) than the interpretation of RL agents.
We propose to consider rewards, the essential objective of RL agents, as the essential objective of interpreting RL agents.
We verify and evaluate our method on the Atari 2600 games as well as Duckietown, a challenging self-driving car simulator environment.
arXiv Detail & Related papers (2023-09-04T09:09:54Z) - Scaling Laws for Imitation Learning in Single-Agent Games [29.941613597833133]
We investigate whether carefully scaling up model and data size can bring similar improvements in the imitation learning setting for single-agent games.
We first demonstrate our findings on a variety of Atari games, and thereafter focus on the extremely challenging game of NetHack.
We find that IL loss and mean return scale smoothly with the compute budget and are strongly correlated, resulting in power laws for training compute-optimal IL agents.
arXiv Detail & Related papers (2023-07-18T16:43:03Z) - Explaining Deep Reinforcement Learning Agents In The Atari Domain
through a Surrogate Model [78.69367679848632]
We describe a lightweight and effective method to derive explanations for deep RL agents.
Our method relies on a transformation of the pixel-based input of the RL agent to an interpretable, percept-like input representation.
We then train a surrogate model, which is itself interpretable, to replicate the behavior of the target, deep RL agent.
arXiv Detail & Related papers (2021-10-07T05:01:44Z) - Explore and Control with Adversarial Surprise [78.41972292110967]
Reinforcement learning (RL) provides a framework for learning goal-directed policies given user-specified rewards.
We propose a new unsupervised RL technique based on an adversarial game which pits two policies against each other to compete over the amount of surprise an RL agent experiences.
We show that our method leads to the emergence of complex skills by exhibiting clear phase transitions.
arXiv Detail & Related papers (2021-07-12T17:58:40Z) - BACKDOORL: Backdoor Attack against Competitive Reinforcement Learning [80.99426477001619]
We migrate backdoor attacks to more complex RL systems involving multiple agents.
As a proof of concept, we demonstrate that an adversary agent can trigger the backdoor of the victim agent with its own action.
The results show that when the backdoor is activated, the winning rate of the victim drops by 17% to 37% compared to when not activated.
arXiv Detail & Related papers (2021-05-02T23:47:55Z) - DeepCrawl: Deep Reinforcement Learning for Turn-based Strategy Games [137.86426963572214]
We introduce DeepCrawl, a fully-playable Roguelike prototype for iOS and Android in which all agents are controlled by policy networks trained using Deep Reinforcement Learning (DRL)
Our aim is to understand whether recent advances in DRL can be used to develop convincing behavioral models for non-player characters in videogames.
arXiv Detail & Related papers (2020-12-03T13:53:29Z) - Agent57: Outperforming the Atari Human Benchmark [15.75730239983062]
Atari games have been a long-standing benchmark in reinforcement learning.
We propose Agent57, the first deep RL agent that outperforms the standard human benchmark on all 57 Atari games.
arXiv Detail & Related papers (2020-03-30T11:33:16Z) - Never Give Up: Learning Directed Exploration Strategies [63.19616370038824]
We propose a reinforcement learning agent to solve hard exploration games by learning a range of directed exploratory policies.
We construct an episodic memory-based intrinsic reward using k-nearest neighbors over the agent's recent experience to train the directed exploratory policies.
A self-supervised inverse dynamics model is used to train the embeddings of the nearest neighbour lookup, biasing the novelty signal towards what the agent can control.
arXiv Detail & Related papers (2020-02-14T13:57:22Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.