Policy Augmentation: An Exploration Strategy for Faster Convergence of
Deep Reinforcement Learning Algorithms
- URL: http://arxiv.org/abs/2102.05249v1
- Date: Wed, 10 Feb 2021 03:51:45 GMT
- Title: Policy Augmentation: An Exploration Strategy for Faster Convergence of
Deep Reinforcement Learning Algorithms
- Authors: Arash Mahyari
- Abstract summary: In this paper, a revolutionary algorithm, called Policy Augmentation, is introduced.
Policy Augmentation is based on a newly developed inductive matrix completion method.
The proposed algorithm augments the values of unexplored state-action pairs, helping the agent take actions that will result in high-value returns while the agent is in the early episodes.
- Score: 0.0
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Despite advancements in deep reinforcement learning algorithms, developing an
effective exploration strategy is still an open problem. Most existing
exploration strategies either are based on simple heuristics, or require the
model of the environment, or train additional deep neural networks to generate
imagination-augmented paths. In this paper, a revolutionary algorithm, called
Policy Augmentation, is introduced. Policy Augmentation is based on a newly
developed inductive matrix completion method. The proposed algorithm augments
the values of unexplored state-action pairs, helping the agent take actions
that will result in high-value returns while the agent is in the early
episodes. Training deep reinforcement learning algorithms with high-value
rollouts leads to the faster convergence of deep reinforcement learning
algorithms. Our experiments show the superior performance of Policy
Augmentation. The code can be found at:
https://github.com/arashmahyari/PolicyAugmentation.
Related papers
- Solving Deep Reinforcement Learning Tasks with Evolution Strategies and Linear Policy Networks [0.017476232824732776]
This study investigates how Evolution Strategies perform compared to gradient-based deep reinforcement learning methods.
We benchmark both deep policy networks and networks consisting of a single linear layer from observations to actions for three gradient-based methods.
Our results reveal that Evolution Strategies can find effective linear policies for many reinforcement learning benchmark tasks.
arXiv Detail & Related papers (2024-02-10T09:15:21Z) - Boosted Off-Policy Learning [21.042970740577648]
We propose the first boosting algorithm for off-policy learning from logged bandit feedback.
Unlike existing boosting methods for supervised learning, our algorithm directly optimize an estimate of the policy's expected reward.
We show how to reduce the base learner to supervised learning, which opens up a broad range of readily available base learners.
arXiv Detail & Related papers (2022-08-01T21:43:02Z) - Jump-Start Reinforcement Learning [68.82380421479675]
We present a meta algorithm that can use offline data, demonstrations, or a pre-existing policy to initialize an RL policy.
In particular, we propose Jump-Start Reinforcement Learning (JSRL), an algorithm that employs two policies to solve tasks.
We show via experiments that JSRL is able to significantly outperform existing imitation and reinforcement learning algorithms.
arXiv Detail & Related papers (2022-04-05T17:25:22Z) - Phase Retrieval using Expectation Consistent Signal Recovery Algorithm
based on Hypernetwork [73.94896986868146]
Phase retrieval is an important component in modern computational imaging systems.
Recent advances in deep learning have opened up a new possibility for robust and fast PR.
We develop a novel framework for deep unfolding to overcome the existing limitations.
arXiv Detail & Related papers (2021-01-12T08:36:23Z) - Evolving Reinforcement Learning Algorithms [186.62294652057062]
We propose a method for meta-learning reinforcement learning algorithms.
The learned algorithms are domain-agnostic and can generalize to new environments not seen during training.
We highlight two learned algorithms which obtain good generalization performance over other classical control tasks, gridworld type tasks, and Atari games.
arXiv Detail & Related papers (2021-01-08T18:55:07Z) - Reannealing of Decaying Exploration Based On Heuristic Measure in Deep
Q-Network [82.20059754270302]
We propose an algorithm based on the idea of reannealing, that aims at encouraging exploration only when it is needed.
We perform an illustrative case study showing that it has potential to both accelerate training and obtain a better policy.
arXiv Detail & Related papers (2020-09-29T20:40:00Z) - Meta-Gradient Reinforcement Learning with an Objective Discovered Online [54.15180335046361]
We propose an algorithm based on meta-gradient descent that discovers its own objective, flexibly parameterised by a deep neural network.
Because the objective is discovered online, it can adapt to changes over time.
On the Atari Learning Environment, the meta-gradient algorithm adapts over time to learn with greater efficiency.
arXiv Detail & Related papers (2020-07-16T16:17:09Z) - Implementation Matters in Deep Policy Gradients: A Case Study on PPO and
TRPO [90.90009491366273]
We study the roots of algorithmic progress in deep policy gradient algorithms through a case study on two popular algorithms.
Specifically, we investigate the consequences of "code-level optimizations:"
Our results show that they (a) are responsible for most of PPO's gain in cumulative reward over TRPO, and (b) fundamentally change how RL methods function.
arXiv Detail & Related papers (2020-05-25T16:24:59Z) - Population-Guided Parallel Policy Search for Reinforcement Learning [17.360163137926]
A new population-guided parallel learning scheme is proposed to enhance the performance of off-policy reinforcement learning (RL)
In the proposed scheme, multiple identical learners with their own value-functions and policies share a common experience replay buffer, and search a good policy in collaboration with the guidance of the best policy information.
arXiv Detail & Related papers (2020-01-09T10:13:57Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.