Maximum State Entropy Exploration using Predecessor and Successor
Representations
- URL: http://arxiv.org/abs/2306.14808v1
- Date: Mon, 26 Jun 2023 16:08:26 GMT
- Title: Maximum State Entropy Exploration using Predecessor and Successor
Representations
- Authors: Arnav Kumar Jain, Lucas Lehnert, Irina Rish, Glen Berseth
- Abstract summary: Animals have a developed ability to explore that aids them in important tasks such as locating food.
We propose $etapsi$-Learning, a method to learn efficient exploratory policies by conditioning on past episodic experience.
- Score: 17.732962106114478
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Animals have a developed ability to explore that aids them in important tasks
such as locating food, exploring for shelter, and finding misplaced items.
These exploration skills necessarily track where they have been so that they
can plan for finding items with relative efficiency. Contemporary exploration
algorithms often learn a less efficient exploration strategy because they
either condition only on the current state or simply rely on making random
open-loop exploratory moves. In this work, we propose $\eta\psi$-Learning, a
method to learn efficient exploratory policies by conditioning on past episodic
experience to make the next exploratory move. Specifically, $\eta\psi$-Learning
learns an exploration policy that maximizes the entropy of the state visitation
distribution of a single trajectory. Furthermore, we demonstrate how variants
of the predecessor representation and successor representations can be combined
to predict the state visitation entropy. Our experiments demonstrate the
efficacy of $\eta\psi$-Learning to strategically explore the environment and
maximize the state coverage with limited samples.
Related papers
- Flipping Coins to Estimate Pseudocounts for Exploration in Reinforcement
Learning [20.0888026410406]
We show that counts can be derived by averaging samples from the Rademacher distribution.
We show that our method is significantly more effective at deducing ground-truth visitation counts than previous work.
arXiv Detail & Related papers (2023-06-05T18:56:48Z) - Neighboring state-based RL Exploration [1.5935205681539144]
We study neighboring state-based, model-free exploration led by the intuition that, for an early-stage agent, considering actions derived from a bounded region of nearby states may lead to better actions when exploring.
We propose two algorithms that choose exploratory actions based on a survey of nearby states, and find that one of our methods, $rho$-explore, consistently outperforms the Double DQN baseline in an discrete environment by 49% in terms of Eval Reward Return.
arXiv Detail & Related papers (2022-12-21T01:23:53Z) - Active Exploration for Inverse Reinforcement Learning [58.295273181096036]
We propose a novel IRL algorithm: Active exploration for Inverse Reinforcement Learning (AceIRL)
AceIRL actively explores an unknown environment and expert policy to quickly learn the expert's reward function and identify a good policy.
We empirically evaluate AceIRL in simulations and find that it significantly outperforms more naive exploration strategies.
arXiv Detail & Related papers (2022-07-18T14:45:55Z) - Active Exploration via Experiment Design in Markov Chains [86.41407938210193]
A key challenge in science and engineering is to design experiments to learn about some unknown quantity of interest.
We propose an algorithm that efficiently selects policies whose measurement allocation converges to the optimal one.
In addition to our theoretical analysis, we showcase our framework on applications in ecological surveillance and pharmacology.
arXiv Detail & Related papers (2022-06-29T00:04:40Z) - Guarantees for Epsilon-Greedy Reinforcement Learning with Function
Approximation [69.1524391595912]
Myopic exploration policies such as epsilon-greedy, softmax, or Gaussian noise fail to explore efficiently in some reinforcement learning tasks.
This paper presents a theoretical analysis of such policies and provides the first regret and sample-complexity bounds for reinforcement learning with myopic exploration.
arXiv Detail & Related papers (2022-06-19T14:44:40Z) - Follow your Nose: Using General Value Functions for Directed Exploration
in Reinforcement Learning [5.40729975786985]
This paper explores the idea of combining exploration with auxiliary task learning using General Value Functions (GVFs) and a directed exploration strategy.
We provide a simple way to learn options (sequences of actions) instead of having to handcraft them, and demonstrate the performance advantage in three navigation tasks.
arXiv Detail & Related papers (2022-03-02T05:14:11Z) - Discovering and Exploiting Sparse Rewards in a Learned Behavior Space [0.46736439782713946]
Learning optimal policies in sparse rewards settings is difficult as the learning agent has little to no feedback on the quality of its actions.
We introduce STAX, an algorithm designed to learn a behavior space on-the-fly and to explore it while efficiently optimizing any reward discovered.
arXiv Detail & Related papers (2021-11-02T22:21:11Z) - Long-Term Exploration in Persistent MDPs [68.8204255655161]
We propose an exploration method called Rollback-Explore (RbExplore)
In this paper, we propose an exploration method called Rollback-Explore (RbExplore), which utilizes the concept of the persistent Markov decision process.
We test our algorithm in the hard-exploration Prince of Persia game, without rewards and domain knowledge.
arXiv Detail & Related papers (2021-09-21T13:47:04Z) - Curious Explorer: a provable exploration strategy in Policy Learning [0.0]
We develop Curious Explorer, a novel and simple iterative state space exploration strategy.
Curious Explorer starts from $rho$, then using intrinsic rewards assigned to the set of poorly visited states produces a sequence of policies.
We show that Curious Explorer can improve performance in MDPs with challenging exploration.
arXiv Detail & Related papers (2021-06-29T15:31:51Z) - Reannealing of Decaying Exploration Based On Heuristic Measure in Deep
Q-Network [82.20059754270302]
We propose an algorithm based on the idea of reannealing, that aims at encouraging exploration only when it is needed.
We perform an illustrative case study showing that it has potential to both accelerate training and obtain a better policy.
arXiv Detail & Related papers (2020-09-29T20:40:00Z) - Never Give Up: Learning Directed Exploration Strategies [63.19616370038824]
We propose a reinforcement learning agent to solve hard exploration games by learning a range of directed exploratory policies.
We construct an episodic memory-based intrinsic reward using k-nearest neighbors over the agent's recent experience to train the directed exploratory policies.
A self-supervised inverse dynamics model is used to train the embeddings of the nearest neighbour lookup, biasing the novelty signal towards what the agent can control.
arXiv Detail & Related papers (2020-02-14T13:57:22Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.