Boosting Exploration in Actor-Critic Algorithms by Incentivizing
Plausible Novel States
- URL: http://arxiv.org/abs/2210.00211v1
- Date: Sat, 1 Oct 2022 07:07:11 GMT
- Title: Boosting Exploration in Actor-Critic Algorithms by Incentivizing
Plausible Novel States
- Authors: Chayan Banerjee, Zhiyong Chen, Nasimul Noman
- Abstract summary: Actor-critic (AC) algorithms are a class of model-free deep reinforcement learning algorithms.
We propose a new method to boost exploration through an intrinsic reward, based on measurement of a state's novelty.
With incentivized exploration of plausible novel states, an AC algorithm is able to improve its sample efficiency and hence training performance.
- Score: 9.210923191081864
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Actor-critic (AC) algorithms are a class of model-free deep reinforcement
learning algorithms, which have proven their efficacy in diverse domains,
especially in solving continuous control problems. Improvement of exploration
(action entropy) and exploitation (expected return) using more efficient
samples is a critical issue in AC algorithms. A basic strategy of a learning
algorithm is to facilitate indiscriminately exploring all of the environment
state space, as well as to encourage exploring rarely visited states rather
than frequently visited one. Under this strategy, we propose a new method to
boost exploration through an intrinsic reward, based on measurement of a
state's novelty and the associated benefit of exploring the state (with regards
to policy optimization), altogether called plausible novelty. With incentivized
exploration of plausible novel states, an AC algorithm is able to improve its
sample efficiency and hence training performance. The new method is verified by
extensive simulations of continuous control tasks of MuJoCo environments on a
variety of prominent off-policy AC algorithms.
Related papers
- Inapplicable Actions Learning for Knowledge Transfer in Reinforcement
Learning [3.194414753332705]
We show that learning inapplicable actions greatly improves the sample efficiency of RL algorithms.
Thanks to the transferability of the knowledge acquired, it can be reused in other tasks and domains to make the learning process more efficient.
arXiv Detail & Related papers (2022-11-28T17:45:39Z) - Guided Exploration in Reinforcement Learning via Monte Carlo Critic Optimization [1.9580473532948401]
We propose a novel guided exploration method that uses an ensemble of Monte Carlo Critics for calculating exploratory action correction.
We present a novel algorithm that leverages the proposed exploratory module for both policy and critic modification.
The presented algorithm demonstrates superior performance compared to modern reinforcement learning algorithms across a variety of problems in the DMControl suite.
arXiv Detail & Related papers (2022-06-25T15:39:52Z) - Sample-Efficient, Exploration-Based Policy Optimisation for Routing
Problems [2.6782615615913348]
This paper presents a new reinforcement learning approach that is based on entropy.
In addition, we design an off-policy-based reinforcement learning technique that maximises the expected return.
We show that our model can generalise to various route problems.
arXiv Detail & Related papers (2022-05-31T09:51:48Z) - Accelerated Policy Learning with Parallel Differentiable Simulation [59.665651562534755]
We present a differentiable simulator and a new policy learning algorithm (SHAC)
Our algorithm alleviates problems with local minima through a smooth critic function.
We show substantial improvements in sample efficiency and wall-clock time over state-of-the-art RL and differentiable simulation-based algorithms.
arXiv Detail & Related papers (2022-04-14T17:46:26Z) - Identifying Co-Adaptation of Algorithmic and Implementational
Innovations in Deep Reinforcement Learning: A Taxonomy and Case Study of
Inference-based Algorithms [15.338931971492288]
We focus on a series of inference-based actor-critic algorithms to decouple their algorithmic innovations and implementation decisions.
We identify substantial performance drops whenever implementation details are mismatched for algorithmic choices.
Results show which implementation details are co-adapted and co-evolved with algorithms.
arXiv Detail & Related papers (2021-03-31T17:55:20Z) - Evolving Reinforcement Learning Algorithms [186.62294652057062]
We propose a method for meta-learning reinforcement learning algorithms.
The learned algorithms are domain-agnostic and can generalize to new environments not seen during training.
We highlight two learned algorithms which obtain good generalization performance over other classical control tasks, gridworld type tasks, and Atari games.
arXiv Detail & Related papers (2021-01-08T18:55:07Z) - Reannealing of Decaying Exploration Based On Heuristic Measure in Deep
Q-Network [82.20059754270302]
We propose an algorithm based on the idea of reannealing, that aims at encouraging exploration only when it is needed.
We perform an illustrative case study showing that it has potential to both accelerate training and obtain a better policy.
arXiv Detail & Related papers (2020-09-29T20:40:00Z) - Reinforcement Learning with Fast Stabilization in Linear Dynamical
Systems [91.43582419264763]
We study model-based reinforcement learning (RL) in unknown stabilizable linear dynamical systems.
We propose an algorithm that certifies fast stabilization of the underlying system by effectively exploring the environment.
We show that the proposed algorithm attains $tildemathcalO(sqrtT)$ regret after $T$ time steps of agent-environment interaction.
arXiv Detail & Related papers (2020-07-23T23:06:40Z) - Provably Efficient Exploration for Reinforcement Learning Using
Unsupervised Learning [96.78504087416654]
Motivated by the prevailing paradigm of using unsupervised learning for efficient exploration in reinforcement learning (RL) problems, we investigate when this paradigm is provably efficient.
We present a general algorithmic framework that is built upon two components: an unsupervised learning algorithm and a noregret tabular RL algorithm.
arXiv Detail & Related papers (2020-03-15T19:23:59Z) - Discrete Action On-Policy Learning with Action-Value Critic [72.20609919995086]
Reinforcement learning (RL) in discrete action space is ubiquitous in real-world applications, but its complexity grows exponentially with the action-space dimension.
We construct a critic to estimate action-value functions, apply it on correlated actions, and combine these critic estimated action values to control the variance of gradient estimation.
These efforts result in a new discrete action on-policy RL algorithm that empirically outperforms related on-policy algorithms relying on variance control techniques.
arXiv Detail & Related papers (2020-02-10T04:23:09Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.