Related papers: State Entropy Maximization with Random Encoders for Efficient Exploration

State Entropy Maximization with Random Encoders for Efficient Exploration

URL: http://arxiv.org/abs/2102.09430v1
Date: Thu, 18 Feb 2021 15:45:17 GMT
Title: State Entropy Maximization with Random Encoders for Efficient Exploration
Authors: Younggyo Seo, Lili Chen, Jinwoo Shin, Honglak Lee, Pieter Abbeel, Kimin Lee
Abstract summary: Recent exploration methods have proven to be a recipe for improving sample-efficiency in deep reinforcement learning (RL) This paper presents Randoms for Efficient Exploration (RE3), an exploration method that utilizes state entropy as an intrinsic reward. In particular, we find that the state entropy can be estimated in a stable and compute-efficient manner by utilizing a randomly encoder.
Score: 162.39202927681484
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Recent exploration methods have proven to be a recipe for improving sample-efficiency in deep reinforcement learning (RL). However, efficient exploration in high-dimensional observation spaces still remains a challenge. This paper presents Random Encoders for Efficient Exploration (RE3), an exploration method that utilizes state entropy as an intrinsic reward. In order to estimate state entropy in environments with high-dimensional observations, we utilize a k-nearest neighbor entropy estimator in the low-dimensional representation space of a convolutional encoder. In particular, we find that the state entropy can be estimated in a stable and compute-efficient manner by utilizing a randomly initialized encoder, which is fixed throughout training. Our experiments show that RE3 significantly improves the sample-efficiency of both model-free and model-based RL methods on locomotion and navigation tasks from DeepMind Control Suite and MiniGrid benchmarks. We also show that RE3 allows learning diverse behaviors without extrinsic rewards, effectively improving sample-efficiency in downstream tasks. Source code and videos are available at https://sites.google.com/view/re3-rl.

Related papers

GaussianFlowOcc: Sparse and Weakly Supervised Occupancy Estimation using Gaussian Splatting and Temporal Flow [0.5852077003870417]
Occupancy estimation has become a prominent task in 3D computer vision. We present a novel approach to occupancy estimation, termed GaussianFlowOcc. It is inspired by Gaussian Splatting and replaces traditional dense voxel grids with a sparse 3D Gaussian representation.
arXiv Detail & Related papers (2025-02-24T16:16:01Z)
Rewarding Episodic Visitation Discrepancy for Exploration in Reinforcement Learning [64.8463574294237]
We propose Rewarding Episodic Visitation Discrepancy (REVD) as an efficient and quantified exploration method. REVD provides intrinsic rewards by evaluating the R'enyi divergence-based visitation discrepancy between episodes. It is tested on PyBullet Robotics Environments and Atari games.
arXiv Detail & Related papers (2022-09-19T08:42:46Z)
Combating Mode Collapse in GANs via Manifold Entropy Estimation [70.06639443446545]
Generative Adversarial Networks (GANs) have shown compelling results in various tasks and applications. We propose a novel training pipeline to address the mode collapse issue of GANs.
arXiv Detail & Related papers (2022-08-25T12:33:31Z)
k-Means Maximum Entropy Exploration [55.81894038654918]
Exploration in continuous spaces with sparse rewards is an open problem in reinforcement learning. We introduce an artificial curiosity algorithm based on lower bounding an approximation to the entropy of the state visitation distribution. We show that our approach is both computationally efficient and competitive on benchmarks for exploration in high-dimensional, continuous spaces.
arXiv Detail & Related papers (2022-05-31T09:05:58Z)
On Reward-Free RL with Kernel and Neural Function Approximations: Single-Agent MDP and Markov Game [140.19656665344917]
We study the reward-free RL problem, where an agent aims to thoroughly explore the environment without any pre-specified reward function. We tackle this problem under the context of function approximation, leveraging powerful function approximators. We establish the first provably efficient reward-free RL algorithm with kernel and neural function approximators.
arXiv Detail & Related papers (2021-10-19T07:26:33Z)
MADE: Exploration via Maximizing Deviation from Explored Regions [48.49228309729319]
In online reinforcement learning (RL), efficient exploration remains challenging in high-dimensional environments with sparse rewards. We propose a new exploration approach via textitmaximizing the deviation of the occupancy of the next policy from the explored regions. Our approach significantly improves sample efficiency over state-of-the-art methods.
arXiv Detail & Related papers (2021-06-18T17:57:00Z)
Fixed $\beta$-VAE Encoding for Curious Exploration in Complex 3D Environments [1.0152838128195467]
We show how a fixed $beta$-VAE encoding can be used effectively with curiosity. We combine this with curriculum learning to solve the previously unsolved exploration intensive detour tasks. We also corroborate the results on Atari Breakout, with our custom encoding outperforming random features and inverse-dynamics features.
arXiv Detail & Related papers (2021-05-18T14:52:36Z)
Efficient Exploration of Reward Functions in Inverse Reinforcement Learning via Bayesian Optimization [43.51553742077343]
inverse reinforcement learning (IRL) is relevant to a variety of tasks including value alignment and robot learning from demonstration. This paper presents an IRL framework called Bayesian optimization-IRL (BO-IRL) which identifies multiple solutions consistent with the expert demonstrations.
arXiv Detail & Related papers (2020-11-17T10:17:45Z)
Langevin Dynamics for Adaptive Inverse Reinforcement Learning of Stochastic Gradient Algorithms [21.796874356469644]
Inverse reinforcement learning (IRL) aims to estimate the reward function of optimizing agents by observing their response. We present a generalized Langevin dynamics to estimate the reward function $R(theta)$. The proposed IRL algorithms use kernel-based passive learning schemes and generate samples from the distribution proportional to $exp(R(theta)$.
arXiv Detail & Related papers (2020-06-20T23:12:11Z)

This list is automatically generated from the titles and abstracts of the papers in this site.