Related papers: Flipping Coins to Estimate Pseudocounts for Exploration in Reinforcement Learning

Flipping Coins to Estimate Pseudocounts for Exploration in Reinforcement Learning

URL: http://arxiv.org/abs/2306.03186v1
Date: Mon, 5 Jun 2023 18:56:48 GMT
Title: Flipping Coins to Estimate Pseudocounts for Exploration in Reinforcement Learning
Authors: Sam Lobel and Akhil Bagaria and George Konidaris
Abstract summary: We show that counts can be derived by averaging samples from the Rademacher distribution. We show that our method is significantly more effective at deducing ground-truth visitation counts than previous work.
Score: 20.0888026410406
License: http://creativecommons.org/licenses/by/4.0/
Abstract: We propose a new method for count-based exploration in high-dimensional state spaces. Unlike previous work which relies on density models, we show that counts can be derived by averaging samples from the Rademacher distribution (or coin flips). This insight is used to set up a simple supervised learning objective which, when optimized, yields a state's visitation count. We show that our method is significantly more effective at deducing ground-truth visitation counts than previous work; when used as an exploration bonus for a model-free reinforcement learning algorithm, it outperforms existing approaches on most of 9 challenging exploration tasks, including the Atari game Montezuma's Revenge.

Related papers

MaxInfoRL: Boosting exploration in reinforcement learning through information gain maximization [91.80034860399677]
Reinforcement learning algorithms aim to balance exploiting the current best strategy with exploring new options that could lead to higher rewards. We introduce a framework, MaxInfoRL, for balancing intrinsic and extrinsic exploration. We show that our approach achieves sublinear regret in the simplified setting of multi-armed bandits.
arXiv Detail & Related papers (2024-12-16T18:59:53Z)
Random Latent Exploration for Deep Reinforcement Learning [71.88709402926415]
We introduce Random Latent Exploration (RLE), a simple yet effective exploration strategy in reinforcement learning (RL) On average, RLE outperforms noise-based methods, which perturb the agent's actions, and bonus-based exploration, which rewards the agent for attempting novel behaviors. RLE is as simple as noise-based methods, as it avoids complex bonus calculations but retains the deep exploration benefits of bonus-based methods.
arXiv Detail & Related papers (2024-07-18T17:55:22Z)
Maximum State Entropy Exploration using Predecessor and Successor Representations [17.732962106114478]
Animals have a developed ability to explore that aids them in important tasks such as locating food. We propose $etapsi$-Learning, a method to learn efficient exploratory policies by conditioning on past episodic experience.
arXiv Detail & Related papers (2023-06-26T16:08:26Z)
Can Direct Latent Model Learning Solve Linear Quadratic Gaussian Control? [75.14973944905216]
We study the task of learning state representations from potentially high-dimensional observations. We pursue a direct latent model learning approach, where a dynamic model in some latent state space is learned by predicting quantities directly related to planning.
arXiv Detail & Related papers (2022-12-30T01:42:04Z)
Reward Uncertainty for Exploration in Preference-based Reinforcement Learning [88.34958680436552]
We present an exploration method specifically for preference-based reinforcement learning algorithms. Our main idea is to design an intrinsic reward by measuring the novelty based on learned reward. Our experiments show that exploration bonus from uncertainty in learned reward improves both feedback- and sample-efficiency of preference-based RL algorithms.
arXiv Detail & Related papers (2022-05-24T23:22:10Z)
Residual Overfit Method of Exploration [78.07532520582313]
We propose an approximate exploration methodology based on fitting only two point estimates, one tuned and one overfit. The approach drives exploration towards actions where the overfit model exhibits the most overfitting compared to the tuned model. We compare ROME against a set of established contextual bandit methods on three datasets and find it to be one of the best performing.
arXiv Detail & Related papers (2021-10-06T17:05:33Z)
MADE: Exploration via Maximizing Deviation from Explored Regions [48.49228309729319]
In online reinforcement learning (RL), efficient exploration remains challenging in high-dimensional environments with sparse rewards. We propose a new exploration approach via textitmaximizing the deviation of the occupancy of the next policy from the explored regions. Our approach significantly improves sample efficiency over state-of-the-art methods.
arXiv Detail & Related papers (2021-06-18T17:57:00Z)
Self-Supervised Exploration via Latent Bayesian Surprise [4.088019409160893]
In this work, we propose a curiosity-based bonus as intrinsic reward for Reinforcement Learning. We extensively evaluate our model by measuring the agent's performance in terms of environment exploration. Our model is cheap and empirically shows state-of-the-art performance on several problems.
arXiv Detail & Related papers (2021-04-15T14:40:16Z)
Latent World Models For Intrinsically Motivated Exploration [140.21871701134626]
We present a self-supervised representation learning method for image-based observations. We consider episodic and life-long uncertainties to guide the exploration of partially observable environments.
arXiv Detail & Related papers (2020-10-05T19:47:04Z)
Novelty Search in Representational Space for Sample Efficient Exploration [38.2027946450689]
We present a new approach for efficient exploration which leverages a low-dimensional encoding of the environment learned with a combination of model-based and model-free objectives. Our approach uses intrinsic rewards that are based on the distance of nearest neighbors in the low dimensional representational space to gauge novelty. We then leverage these intrinsic rewards for sample-efficient exploration with planning routines in representational space for hard exploration tasks with sparse rewards.
arXiv Detail & Related papers (2020-09-28T18:51:52Z)
Never Give Up: Learning Directed Exploration Strategies [63.19616370038824]
We propose a reinforcement learning agent to solve hard exploration games by learning a range of directed exploratory policies. We construct an episodic memory-based intrinsic reward using k-nearest neighbors over the agent's recent experience to train the directed exploratory policies. A self-supervised inverse dynamics model is used to train the embeddings of the nearest neighbour lookup, biasing the novelty signal towards what the agent can control.
arXiv Detail & Related papers (2020-02-14T13:57:22Z)
Long-Term Visitation Value for Deep Exploration in Sparse Reward Reinforcement Learning [34.38011902445557]
Reinforcement learning with sparse rewards is still an open challenge. We present a novel approach that plans exploration actions far into the future by using a long-term visitation count. Contrary to existing methods which use models of reward and dynamics, our approach is off-policy and model-free.
arXiv Detail & Related papers (2020-01-01T01:01:15Z)

This list is automatically generated from the titles and abstracts of the papers in this site.