Reannealing of Decaying Exploration Based On Heuristic Measure in Deep
Q-Network
- URL: http://arxiv.org/abs/2009.14297v1
- Date: Tue, 29 Sep 2020 20:40:00 GMT
- Title: Reannealing of Decaying Exploration Based On Heuristic Measure in Deep
Q-Network
- Authors: Xing Wang, Alexander Vinel
- Abstract summary: We propose an algorithm based on the idea of reannealing, that aims at encouraging exploration only when it is needed.
We perform an illustrative case study showing that it has potential to both accelerate training and obtain a better policy.
- Score: 82.20059754270302
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Existing exploration strategies in reinforcement learning (RL) often either
ignore the history or feedback of search, or are complicated to implement.
There is also a very limited literature showing their effectiveness over
diverse domains. We propose an algorithm based on the idea of reannealing, that
aims at encouraging exploration only when it is needed, for example, when the
algorithm detects that the agent is stuck in a local optimum. The approach is
simple to implement. We perform an illustrative case study showing that it has
potential to both accelerate training and obtain a better policy.
Related papers
- On the Importance of Exploration for Generalization in Reinforcement
Learning [89.63074327328765]
We propose EDE: Exploration via Distributional Ensemble, a method that encourages exploration of states with high uncertainty.
Our algorithm is the first value-based approach to achieve state-of-the-art on both Procgen and Crafter.
arXiv Detail & Related papers (2023-06-08T18:07:02Z) - Inapplicable Actions Learning for Knowledge Transfer in Reinforcement
Learning [3.194414753332705]
We show that learning inapplicable actions greatly improves the sample efficiency of RL algorithms.
Thanks to the transferability of the knowledge acquired, it can be reused in other tasks and domains to make the learning process more efficient.
arXiv Detail & Related papers (2022-11-28T17:45:39Z) - Boosting Exploration in Actor-Critic Algorithms by Incentivizing
Plausible Novel States [9.210923191081864]
Actor-critic (AC) algorithms are a class of model-free deep reinforcement learning algorithms.
We propose a new method to boost exploration through an intrinsic reward, based on measurement of a state's novelty.
With incentivized exploration of plausible novel states, an AC algorithm is able to improve its sample efficiency and hence training performance.
arXiv Detail & Related papers (2022-10-01T07:07:11Z) - Jump-Start Reinforcement Learning [68.82380421479675]
We present a meta algorithm that can use offline data, demonstrations, or a pre-existing policy to initialize an RL policy.
In particular, we propose Jump-Start Reinforcement Learning (JSRL), an algorithm that employs two policies to solve tasks.
We show via experiments that JSRL is able to significantly outperform existing imitation and reinforcement learning algorithms.
arXiv Detail & Related papers (2022-04-05T17:25:22Z) - Exploring More When It Needs in Deep Reinforcement Learning [3.442899929543427]
We propose a mechanism of policy in Deep Reinforcement Learning, which is exploring more when agent needs, called Add Noise to Noise (AN2N)
We use cumulative rewards to evaluate which past states the agents have not performed well, and use cosine distance to measure whether the current state needs to be explored more.
We apply it to the field of continuous control tasks, such as halfCheetah, Hopper, and Swimmer, achieving considerable improvement in performance and convergence speed.
arXiv Detail & Related papers (2021-09-28T04:29:38Z) - MURAL: Meta-Learning Uncertainty-Aware Rewards for Outcome-Driven
Reinforcement Learning [65.52675802289775]
We show that an uncertainty aware classifier can solve challenging reinforcement learning problems.
We propose a novel method for computing the normalized maximum likelihood (NML) distribution.
We show that the resulting algorithm has a number of intriguing connections to both count-based exploration methods and prior algorithms for learning reward functions.
arXiv Detail & Related papers (2021-07-15T08:19:57Z) - Exploration and Incentives in Reinforcement Learning [107.42240386544633]
We consider complex exploration problems, where each agent faces the same (but unknown) MDP.
Agents control the choice of policies, whereas an algorithm can only issue recommendations.
We design an algorithm which explores all reachable states in the MDP.
arXiv Detail & Related papers (2021-02-28T00:15:53Z) - Policy Augmentation: An Exploration Strategy for Faster Convergence of
Deep Reinforcement Learning Algorithms [0.0]
In this paper, a revolutionary algorithm, called Policy Augmentation, is introduced.
Policy Augmentation is based on a newly developed inductive matrix completion method.
The proposed algorithm augments the values of unexplored state-action pairs, helping the agent take actions that will result in high-value returns while the agent is in the early episodes.
arXiv Detail & Related papers (2021-02-10T03:51:45Z) - Provably Efficient Exploration for Reinforcement Learning Using
Unsupervised Learning [96.78504087416654]
Motivated by the prevailing paradigm of using unsupervised learning for efficient exploration in reinforcement learning (RL) problems, we investigate when this paradigm is provably efficient.
We present a general algorithmic framework that is built upon two components: an unsupervised learning algorithm and a noregret tabular RL algorithm.
arXiv Detail & Related papers (2020-03-15T19:23:59Z) - Reinforcement Learning with Probabilistically Complete Exploration [27.785017885906313]
We propose Rapidly Randomly-exploring Reinforcement Learning (R3L)
We formulate exploration as a search problem and leverage widely-used planning algorithms to find initial solutions.
We experimentally demonstrate the method, requiring only a fraction of exploration samples and achieving better performance.
arXiv Detail & Related papers (2020-01-20T02:11:24Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.