ADER:Adapting between Exploration and Robustness for Actor-Critic
Methods
- URL: http://arxiv.org/abs/2109.03443v1
- Date: Wed, 8 Sep 2021 05:48:39 GMT
- Title: ADER:Adapting between Exploration and Robustness for Actor-Critic
Methods
- Authors: Bo Zhou, Kejiao Li, Hongsheng Zeng, Fan Wang, Hao Tian
- Abstract summary: We show that TD3's performance lags behind the vanilla actor-critic methods in some primitive environments.
We propose a novel algorithm toward this problem that ADapts between Exploration and Robustness, namely ADER.
Experiments in several challenging environments demonstrate the supremacy of the proposed method in continuous control tasks.
- Score: 8.750251598581102
- License: http://creativecommons.org/licenses/by-nc-sa/4.0/
- Abstract: Combining off-policy reinforcement learning methods with function
approximators such as neural networks has been found to lead to overestimation
of the value function and sub-optimal solutions. Improvement such as TD3 has
been proposed to address this issue. However, we surprisingly find that its
performance lags behind the vanilla actor-critic methods (such as DDPG) in some
primitive environments. In this paper, we show that the failure of some cases
can be attributed to insufficient exploration. We reveal the culprit of
insufficient exploration in TD3, and propose a novel algorithm toward this
problem that ADapts between Exploration and Robustness, namely ADER. To enhance
the exploration ability while eliminating the overestimation bias, we introduce
a dynamic penalty term in value estimation calculated from estimated
uncertainty, which takes into account different compositions of the uncertainty
in different learning stages. Experiments in several challenging environments
demonstrate the supremacy of the proposed method in continuous control tasks.
Related papers
- Adaptive trajectory-constrained exploration strategy for deep
reinforcement learning [6.589742080994319]
Deep reinforcement learning (DRL) faces significant challenges in addressing the hard-exploration problems in tasks with sparse or deceptive rewards and large state spaces.
We propose an efficient adaptive trajectory-constrained exploration strategy for DRL.
We conduct experiments on two large 2D grid world mazes and several MuJoCo tasks.
arXiv Detail & Related papers (2023-12-27T07:57:15Z) - Never Explore Repeatedly in Multi-Agent Reinforcement Learning [40.35950679063337]
We propose a dynamic reward scaling approach to combat "revisitation"
We show enhanced performance in demanding environments like Google Research Football and StarCraft II micromanagement tasks.
arXiv Detail & Related papers (2023-08-19T05:27:48Z) - Rewarding Episodic Visitation Discrepancy for Exploration in
Reinforcement Learning [64.8463574294237]
We propose Rewarding Episodic Visitation Discrepancy (REVD) as an efficient and quantified exploration method.
REVD provides intrinsic rewards by evaluating the R'enyi divergence-based visitation discrepancy between episodes.
It is tested on PyBullet Robotics Environments and Atari games.
arXiv Detail & Related papers (2022-09-19T08:42:46Z) - On the Minimal Adversarial Perturbation for Deep Neural Networks with
Provable Estimation Error [65.51757376525798]
The existence of adversarial perturbations has opened an interesting research line on provable robustness.
No provable results have been presented to estimate and bound the error committed.
This paper proposes two lightweight strategies to find the minimal adversarial perturbation.
The obtained results show that the proposed strategies approximate the theoretical distance and robustness for samples close to the classification, leading to provable guarantees against any adversarial attacks.
arXiv Detail & Related papers (2022-01-04T16:40:03Z) - Surveillance Evasion Through Bayesian Reinforcement Learning [78.79938727251594]
We consider a 2D continuous path planning problem with a completely unknown intensity of random termination.
Those Observers' surveillance intensity is a priori unknown and has to be learned through repetitive path planning.
arXiv Detail & Related papers (2021-09-30T02:29:21Z) - Geometry Uncertainty Projection Network for Monocular 3D Object
Detection [138.24798140338095]
We propose a Geometry Uncertainty Projection Network (GUP Net) to tackle the error amplification problem at both inference and training stages.
Specifically, a GUP module is proposed to obtains the geometry-guided uncertainty of the inferred depth.
At the training stage, we propose a Hierarchical Task Learning strategy to reduce the instability caused by error amplification.
arXiv Detail & Related papers (2021-07-29T06:59:07Z) - A Vision Based Deep Reinforcement Learning Algorithm for UAV Obstacle
Avoidance [1.2693545159861856]
We present two techniques for improving exploration for UAV obstacle avoidance.
The first is a convergence-based approach that uses convergence error to iterate through unexplored actions and temporal threshold to balance exploration and exploitation.
The second is a guidance-based approach which uses a Gaussian mixture distribution to compare previously seen states to a predicted next state in order to select the next action.
arXiv Detail & Related papers (2021-03-11T01:15:26Z) - Attribute-Guided Adversarial Training for Robustness to Natural
Perturbations [64.35805267250682]
We propose an adversarial training approach which learns to generate new samples so as to maximize exposure of the classifier to the attributes-space.
Our approach enables deep neural networks to be robust against a wide range of naturally occurring perturbations.
arXiv Detail & Related papers (2020-12-03T10:17:30Z) - Temporal Difference Uncertainties as a Signal for Exploration [76.6341354269013]
An effective approach to exploration in reinforcement learning is to rely on an agent's uncertainty over the optimal policy.
In this paper, we highlight that value estimates are easily biased and temporally inconsistent.
We propose a novel method for estimating uncertainty over the value function that relies on inducing a distribution over temporal difference errors.
arXiv Detail & Related papers (2020-10-05T18:11:22Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.