Flipping Coins to Estimate Pseudocounts for Exploration in Reinforcement
Learning
- URL: http://arxiv.org/abs/2306.03186v1
- Date: Mon, 5 Jun 2023 18:56:48 GMT
- Title: Flipping Coins to Estimate Pseudocounts for Exploration in Reinforcement
Learning
- Authors: Sam Lobel and Akhil Bagaria and George Konidaris
- Abstract summary: We show that counts can be derived by averaging samples from the Rademacher distribution.
We show that our method is significantly more effective at deducing ground-truth visitation counts than previous work.
- Score: 20.0888026410406
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: We propose a new method for count-based exploration in high-dimensional state
spaces. Unlike previous work which relies on density models, we show that
counts can be derived by averaging samples from the Rademacher distribution (or
coin flips). This insight is used to set up a simple supervised learning
objective which, when optimized, yields a state's visitation count. We show
that our method is significantly more effective at deducing ground-truth
visitation counts than previous work; when used as an exploration bonus for a
model-free reinforcement learning algorithm, it outperforms existing approaches
on most of 9 challenging exploration tasks, including the Atari game
Montezuma's Revenge.
Related papers
- Maximum State Entropy Exploration using Predecessor and Successor
Representations [17.732962106114478]
Animals have a developed ability to explore that aids them in important tasks such as locating food.
We propose $etapsi$-Learning, a method to learn efficient exploratory policies by conditioning on past episodic experience.
arXiv Detail & Related papers (2023-06-26T16:08:26Z) - Can Direct Latent Model Learning Solve Linear Quadratic Gaussian
Control? [75.14973944905216]
We study the task of learning state representations from potentially high-dimensional observations.
We pursue a direct latent model learning approach, where a dynamic model in some latent state space is learned by predicting quantities directly related to planning.
arXiv Detail & Related papers (2022-12-30T01:42:04Z) - Reward Uncertainty for Exploration in Preference-based Reinforcement
Learning [88.34958680436552]
We present an exploration method specifically for preference-based reinforcement learning algorithms.
Our main idea is to design an intrinsic reward by measuring the novelty based on learned reward.
Our experiments show that exploration bonus from uncertainty in learned reward improves both feedback- and sample-efficiency of preference-based RL algorithms.
arXiv Detail & Related papers (2022-05-24T23:22:10Z) - Residual Overfit Method of Exploration [78.07532520582313]
We propose an approximate exploration methodology based on fitting only two point estimates, one tuned and one overfit.
The approach drives exploration towards actions where the overfit model exhibits the most overfitting compared to the tuned model.
We compare ROME against a set of established contextual bandit methods on three datasets and find it to be one of the best performing.
arXiv Detail & Related papers (2021-10-06T17:05:33Z) - MADE: Exploration via Maximizing Deviation from Explored Regions [48.49228309729319]
In online reinforcement learning (RL), efficient exploration remains challenging in high-dimensional environments with sparse rewards.
We propose a new exploration approach via textitmaximizing the deviation of the occupancy of the next policy from the explored regions.
Our approach significantly improves sample efficiency over state-of-the-art methods.
arXiv Detail & Related papers (2021-06-18T17:57:00Z) - Self-Supervised Exploration via Latent Bayesian Surprise [4.088019409160893]
In this work, we propose a curiosity-based bonus as intrinsic reward for Reinforcement Learning.
We extensively evaluate our model by measuring the agent's performance in terms of environment exploration.
Our model is cheap and empirically shows state-of-the-art performance on several problems.
arXiv Detail & Related papers (2021-04-15T14:40:16Z) - Latent World Models For Intrinsically Motivated Exploration [140.21871701134626]
We present a self-supervised representation learning method for image-based observations.
We consider episodic and life-long uncertainties to guide the exploration of partially observable environments.
arXiv Detail & Related papers (2020-10-05T19:47:04Z) - Novelty Search in Representational Space for Sample Efficient
Exploration [38.2027946450689]
We present a new approach for efficient exploration which leverages a low-dimensional encoding of the environment learned with a combination of model-based and model-free objectives.
Our approach uses intrinsic rewards that are based on the distance of nearest neighbors in the low dimensional representational space to gauge novelty.
We then leverage these intrinsic rewards for sample-efficient exploration with planning routines in representational space for hard exploration tasks with sparse rewards.
arXiv Detail & Related papers (2020-09-28T18:51:52Z) - Never Give Up: Learning Directed Exploration Strategies [63.19616370038824]
We propose a reinforcement learning agent to solve hard exploration games by learning a range of directed exploratory policies.
We construct an episodic memory-based intrinsic reward using k-nearest neighbors over the agent's recent experience to train the directed exploratory policies.
A self-supervised inverse dynamics model is used to train the embeddings of the nearest neighbour lookup, biasing the novelty signal towards what the agent can control.
arXiv Detail & Related papers (2020-02-14T13:57:22Z) - Long-Term Visitation Value for Deep Exploration in Sparse Reward
Reinforcement Learning [34.38011902445557]
Reinforcement learning with sparse rewards is still an open challenge.
We present a novel approach that plans exploration actions far into the future by using a long-term visitation count.
Contrary to existing methods which use models of reward and dynamics, our approach is off-policy and model-free.
arXiv Detail & Related papers (2020-01-01T01:01:15Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.