Guided Exploration in Reinforcement Learning via Monte Carlo Critic Optimization
- URL: http://arxiv.org/abs/2206.12674v2
- Date: Mon, 6 May 2024 09:09:58 GMT
- Title: Guided Exploration in Reinforcement Learning via Monte Carlo Critic Optimization
- Authors: Igor Kuznetsov,
- Abstract summary: We propose a novel guided exploration method that uses an ensemble of Monte Carlo Critics for calculating exploratory action correction.
We present a novel algorithm that leverages the proposed exploratory module for both policy and critic modification.
The presented algorithm demonstrates superior performance compared to modern reinforcement learning algorithms across a variety of problems in the DMControl suite.
- Score: 1.9580473532948401
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: The class of deep deterministic off-policy algorithms is effectively applied to solve challenging continuous control problems. Current approaches commonly utilize random noise as an exploration method, which has several drawbacks, including the need for manual adjustment for a given task and the absence of exploratory calibration during the training process. We address these challenges by proposing a novel guided exploration method that uses an ensemble of Monte Carlo Critics for calculating exploratory action correction. The proposed method enhances the traditional exploration scheme by dynamically adjusting exploration. Subsequently, we present a novel algorithm that leverages the proposed exploratory module for both policy and critic modification. The presented algorithm demonstrates superior performance compared to modern reinforcement learning algorithms across a variety of problems in the DMControl suite.
Related papers
- Proximal Policy Optimization with Adaptive Exploration [0.0]
This paper investigates the exploration-exploitation tradeoff within the context of reinforcement learning.
The proposed adaptive exploration framework dynamically adjusts the exploration magnitude during training based on the recent performance of the agent.
arXiv Detail & Related papers (2024-05-07T20:51:49Z) - Model-Based Reinforcement Learning Control of Reaction-Diffusion
Problems [0.0]
reinforcement learning has been applied to decision-making in several applications, most notably in games.
We introduce two novel reward functions to drive the flow of the transported field.
Results show that certain controls can be implemented successfully in these applications.
arXiv Detail & Related papers (2024-02-22T11:06:07Z) - Boosting Exploration in Actor-Critic Algorithms by Incentivizing
Plausible Novel States [9.210923191081864]
Actor-critic (AC) algorithms are a class of model-free deep reinforcement learning algorithms.
We propose a new method to boost exploration through an intrinsic reward, based on measurement of a state's novelty.
With incentivized exploration of plausible novel states, an AC algorithm is able to improve its sample efficiency and hence training performance.
arXiv Detail & Related papers (2022-10-01T07:07:11Z) - Learning Sampling Policy for Faster Derivative Free Optimization [100.27518340593284]
We propose a new reinforcement learning based ZO algorithm (ZO-RL) with learning the sampling policy for generating the perturbations in ZO optimization instead of using random sampling.
Our results show that our ZO-RL algorithm can effectively reduce the variances of ZO gradient by learning a sampling policy, and converge faster than existing ZO algorithms in different scenarios.
arXiv Detail & Related papers (2021-04-09T14:50:59Z) - Evolving Reinforcement Learning Algorithms [186.62294652057062]
We propose a method for meta-learning reinforcement learning algorithms.
The learned algorithms are domain-agnostic and can generalize to new environments not seen during training.
We highlight two learned algorithms which obtain good generalization performance over other classical control tasks, gridworld type tasks, and Atari games.
arXiv Detail & Related papers (2021-01-08T18:55:07Z) - Reinforcement Learning for Low-Thrust Trajectory Design of
Interplanetary Missions [77.34726150561087]
This paper investigates the use of reinforcement learning for the robust design of interplanetary trajectories in presence of severe disturbances.
An open-source implementation of the state-of-the-art algorithm Proximal Policy Optimization is adopted.
The resulting Guidance and Control Network provides both a robust nominal trajectory and the associated closed-loop guidance law.
arXiv Detail & Related papers (2020-08-19T15:22:15Z) - Reinforcement Learning with Fast Stabilization in Linear Dynamical
Systems [91.43582419264763]
We study model-based reinforcement learning (RL) in unknown stabilizable linear dynamical systems.
We propose an algorithm that certifies fast stabilization of the underlying system by effectively exploring the environment.
We show that the proposed algorithm attains $tildemathcalO(sqrtT)$ regret after $T$ time steps of agent-environment interaction.
arXiv Detail & Related papers (2020-07-23T23:06:40Z) - Responsive Safety in Reinforcement Learning by PID Lagrangian Methods [74.49173841304474]
Lagrangian methods exhibit oscillations and overshoot which, when applied to safe reinforcement learning, leads to constraint-violating behavior.
We propose a novel Lagrange multiplier update method that utilizes derivatives of the constraint function.
We apply our PID Lagrangian methods in deep RL, setting a new state of the art in Safety Gym, a safe RL benchmark.
arXiv Detail & Related papers (2020-07-08T08:43:14Z) - Active Learning for Gaussian Process Considering Uncertainties with
Application to Shape Control of Composite Fuselage [7.358477502214471]
We propose two new active learning algorithms for the Gaussian process with uncertainties.
We show that the proposed approach can incorporate the impact from uncertainties, and realize better prediction performance.
This approach has been applied to improving the predictive modeling for automatic shape control of composite fuselage.
arXiv Detail & Related papers (2020-04-23T02:04:53Z) - Average Reward Adjusted Discounted Reinforcement Learning:
Near-Blackwell-Optimal Policies for Real-World Applications [0.0]
Reinforcement learning aims at finding the best stationary policy for a given Markov Decision Process.
This paper provides deep theoretical insights to the widely applied standard discounted reinforcement learning framework.
We establish a novel near-Blackwell-optimal reinforcement learning algorithm.
arXiv Detail & Related papers (2020-04-02T08:05:18Z) - Reward-Conditioned Policies [100.64167842905069]
imitation learning requires near-optimal expert data.
Can we learn effective policies via supervised learning without demonstrations?
We show how such an approach can be derived as a principled method for policy search.
arXiv Detail & Related papers (2019-12-31T18:07:43Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.