State Entropy Maximization with Random Encoders for Efficient
Exploration
- URL: http://arxiv.org/abs/2102.09430v1
- Date: Thu, 18 Feb 2021 15:45:17 GMT
- Title: State Entropy Maximization with Random Encoders for Efficient
Exploration
- Authors: Younggyo Seo, Lili Chen, Jinwoo Shin, Honglak Lee, Pieter Abbeel,
Kimin Lee
- Abstract summary: Recent exploration methods have proven to be a recipe for improving sample-efficiency in deep reinforcement learning (RL)
This paper presents Randoms for Efficient Exploration (RE3), an exploration method that utilizes state entropy as an intrinsic reward.
In particular, we find that the state entropy can be estimated in a stable and compute-efficient manner by utilizing a randomly encoder.
- Score: 162.39202927681484
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Recent exploration methods have proven to be a recipe for improving
sample-efficiency in deep reinforcement learning (RL). However, efficient
exploration in high-dimensional observation spaces still remains a challenge.
This paper presents Random Encoders for Efficient Exploration (RE3), an
exploration method that utilizes state entropy as an intrinsic reward. In order
to estimate state entropy in environments with high-dimensional observations,
we utilize a k-nearest neighbor entropy estimator in the low-dimensional
representation space of a convolutional encoder. In particular, we find that
the state entropy can be estimated in a stable and compute-efficient manner by
utilizing a randomly initialized encoder, which is fixed throughout training.
Our experiments show that RE3 significantly improves the sample-efficiency of
both model-free and model-based RL methods on locomotion and navigation tasks
from DeepMind Control Suite and MiniGrid benchmarks. We also show that RE3
allows learning diverse behaviors without extrinsic rewards, effectively
improving sample-efficiency in downstream tasks. Source code and videos are
available at https://sites.google.com/view/re3-rl.
Related papers
- Rewarding Episodic Visitation Discrepancy for Exploration in
Reinforcement Learning [64.8463574294237]
We propose Rewarding Episodic Visitation Discrepancy (REVD) as an efficient and quantified exploration method.
REVD provides intrinsic rewards by evaluating the R'enyi divergence-based visitation discrepancy between episodes.
It is tested on PyBullet Robotics Environments and Atari games.
arXiv Detail & Related papers (2022-09-19T08:42:46Z) - Combating Mode Collapse in GANs via Manifold Entropy Estimation [70.06639443446545]
Generative Adversarial Networks (GANs) have shown compelling results in various tasks and applications.
We propose a novel training pipeline to address the mode collapse issue of GANs.
arXiv Detail & Related papers (2022-08-25T12:33:31Z) - k-Means Maximum Entropy Exploration [55.81894038654918]
Exploration in continuous spaces with sparse rewards is an open problem in reinforcement learning.
We introduce an artificial curiosity algorithm based on lower bounding an approximation to the entropy of the state visitation distribution.
We show that our approach is both computationally efficient and competitive on benchmarks for exploration in high-dimensional, continuous spaces.
arXiv Detail & Related papers (2022-05-31T09:05:58Z) - On Reward-Free RL with Kernel and Neural Function Approximations:
Single-Agent MDP and Markov Game [140.19656665344917]
We study the reward-free RL problem, where an agent aims to thoroughly explore the environment without any pre-specified reward function.
We tackle this problem under the context of function approximation, leveraging powerful function approximators.
We establish the first provably efficient reward-free RL algorithm with kernel and neural function approximators.
arXiv Detail & Related papers (2021-10-19T07:26:33Z) - MADE: Exploration via Maximizing Deviation from Explored Regions [48.49228309729319]
In online reinforcement learning (RL), efficient exploration remains challenging in high-dimensional environments with sparse rewards.
We propose a new exploration approach via textitmaximizing the deviation of the occupancy of the next policy from the explored regions.
Our approach significantly improves sample efficiency over state-of-the-art methods.
arXiv Detail & Related papers (2021-06-18T17:57:00Z) - Fixed $\beta$-VAE Encoding for Curious Exploration in Complex 3D
Environments [1.0152838128195467]
We show how a fixed $beta$-VAE encoding can be used effectively with curiosity.
We combine this with curriculum learning to solve the previously unsolved exploration intensive detour tasks.
We also corroborate the results on Atari Breakout, with our custom encoding outperforming random features and inverse-dynamics features.
arXiv Detail & Related papers (2021-05-18T14:52:36Z) - Efficient Exploration of Reward Functions in Inverse Reinforcement
Learning via Bayesian Optimization [43.51553742077343]
inverse reinforcement learning (IRL) is relevant to a variety of tasks including value alignment and robot learning from demonstration.
This paper presents an IRL framework called Bayesian optimization-IRL (BO-IRL) which identifies multiple solutions consistent with the expert demonstrations.
arXiv Detail & Related papers (2020-11-17T10:17:45Z) - Langevin Dynamics for Adaptive Inverse Reinforcement Learning of
Stochastic Gradient Algorithms [21.796874356469644]
Inverse reinforcement learning (IRL) aims to estimate the reward function of optimizing agents by observing their response.
We present a generalized Langevin dynamics to estimate the reward function $R(theta)$.
The proposed IRL algorithms use kernel-based passive learning schemes and generate samples from the distribution proportional to $exp(R(theta)$.
arXiv Detail & Related papers (2020-06-20T23:12:11Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.