Explore and Control with Adversarial Surprise
- URL: http://arxiv.org/abs/2107.07394v1
- Date: Mon, 12 Jul 2021 17:58:40 GMT
- Title: Explore and Control with Adversarial Surprise
- Authors: Arnaud Fickinger, Natasha Jaques, Samyak Parajuli, Michael Chang,
Nicholas Rhinehart, Glen Berseth, Stuart Russell, Sergey Levine
- Abstract summary: Reinforcement learning (RL) provides a framework for learning goal-directed policies given user-specified rewards.
We propose a new unsupervised RL technique based on an adversarial game which pits two policies against each other to compete over the amount of surprise an RL agent experiences.
We show that our method leads to the emergence of complex skills by exhibiting clear phase transitions.
- Score: 78.41972292110967
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Reinforcement learning (RL) provides a framework for learning goal-directed
policies given user-specified rewards. However, since designing rewards often
requires substantial engineering effort, we are interested in the problem of
learning without rewards, where agents must discover useful behaviors in the
absence of task-specific incentives. Intrinsic motivation is a family of
unsupervised RL techniques which develop general objectives for an RL agent to
optimize that lead to better exploration or the discovery of skills. In this
paper, we propose a new unsupervised RL technique based on an adversarial game
which pits two policies against each other to compete over the amount of
surprise an RL agent experiences. The policies each take turns controlling the
agent. The Explore policy maximizes entropy, putting the agent into surprising
or unfamiliar situations. Then, the Control policy takes over and seeks to
recover from those situations by minimizing entropy. The game harnesses the
power of multi-agent competition to drive the agent to seek out increasingly
surprising parts of the environment while learning to gain mastery over them.
We show empirically that our method leads to the emergence of complex skills by
exhibiting clear phase transitions. Furthermore, we show both theoretically
(via a latent state space coverage argument) and empirically that our method
has the potential to be applied to the exploration of stochastic,
partially-observed environments. We show that Adversarial Surprise learns more
complex behaviors, and explores more effectively than competitive baselines,
outperforming intrinsic motivation methods based on active inference,
novelty-seeking (Random Network Distillation (RND)), and multi-agent
unsupervised RL (Asymmetric Self-Play (ASP)) in MiniGrid, Atari and VizDoom
environments.
Related papers
- Leveraging Reward Consistency for Interpretable Feature Discovery in
Reinforcement Learning [69.19840497497503]
It is argued that the commonly used action matching principle is more like an explanation of deep neural networks (DNNs) than the interpretation of RL agents.
We propose to consider rewards, the essential objective of RL agents, as the essential objective of interpreting RL agents.
We verify and evaluate our method on the Atari 2600 games as well as Duckietown, a challenging self-driving car simulator environment.
arXiv Detail & Related papers (2023-09-04T09:09:54Z) - Flexible Attention-Based Multi-Policy Fusion for Efficient Deep
Reinforcement Learning [78.31888150539258]
Reinforcement learning (RL) agents have long sought to approach the efficiency of human learning.
Prior studies in RL have incorporated external knowledge policies to help agents improve sample efficiency.
We present Knowledge-Grounded RL (KGRL), an RL paradigm fusing multiple knowledge policies and aiming for human-like efficiency and flexibility.
arXiv Detail & Related papers (2022-10-07T17:56:57Z) - Mastering the Unsupervised Reinforcement Learning Benchmark from Pixels [112.63440666617494]
Reinforcement learning algorithms can succeed but require large amounts of interactions between the agent and the environment.
We propose a new method to solve it, using unsupervised model-based RL, for pre-training the agent.
We show robust performance on the Real-Word RL benchmark, hinting at resiliency to environment perturbations during adaptation.
arXiv Detail & Related papers (2022-09-24T14:22:29Z) - SEREN: Knowing When to Explore and When to Exploit [14.188362393915432]
We introduce Sive Reinforcement Exploration Network (SEREN) that poses the exploration-exploitation trade-off as a game.
Using a form of policies known as impulse control, switcher is able to determine the best set of states to switch to the exploration policy.
We prove that SEREN converges quickly and induces a natural schedule towards pure exploitation.
arXiv Detail & Related papers (2022-05-30T12:44:56Z) - Learning to Incentivize Other Learning Agents [73.03133692589532]
We show how to equip RL agents with the ability to give rewards directly to other agents, using a learned incentive function.
Such agents significantly outperform standard RL and opponent-shaping agents in challenging general-sum Markov games.
Our work points toward more opportunities and challenges along the path to ensure the common good in a multi-agent future.
arXiv Detail & Related papers (2020-06-10T20:12:38Z) - Never Give Up: Learning Directed Exploration Strategies [63.19616370038824]
We propose a reinforcement learning agent to solve hard exploration games by learning a range of directed exploratory policies.
We construct an episodic memory-based intrinsic reward using k-nearest neighbors over the agent's recent experience to train the directed exploratory policies.
A self-supervised inverse dynamics model is used to train the embeddings of the nearest neighbour lookup, biasing the novelty signal towards what the agent can control.
arXiv Detail & Related papers (2020-02-14T13:57:22Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.