Mitigating Estimation Errors by Twin TD-Regularized Actor and Critic for
Deep Reinforcement Learning
- URL: http://arxiv.org/abs/2311.03711v1
- Date: Tue, 7 Nov 2023 04:30:51 GMT
- Title: Mitigating Estimation Errors by Twin TD-Regularized Actor and Critic for
Deep Reinforcement Learning
- Authors: Junmin Zhong, Ruofan Wu, and Jennie Si
- Abstract summary: We introduce a new, twin TD-regularized actor-critic (TDR) method to address the issue of estimation bias in deep reinforcement learning (DRL)
We show that our new actor-critic learning has enabled DRL methods to outperform their respective baselines in challenging environments in DeepMind Control Suite.
- Score: 10.577516871906816
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: We address the issue of estimation bias in deep reinforcement learning (DRL)
by introducing solution mechanisms that include a new, twin TD-regularized
actor-critic (TDR) method. It aims at reducing both over and under-estimation
errors. With TDR and by combining good DRL improvements, such as distributional
learning and long N-step surrogate stage reward (LNSS) method, we show that our
new TDR-based actor-critic learning has enabled DRL methods to outperform their
respective baselines in challenging environments in DeepMind Control Suite.
Furthermore, they elevate TD3 and SAC respectively to a level of performance
comparable to that of D4PG (the current SOTA), and they also improve the
performance of D4PG to a new SOTA level measured by mean reward, convergence
speed, learning success rate, and learning variance.
Related papers
- Ratio Divergence Learning Using Target Energy in Restricted Boltzmann Machines: Beyond Kullback--Leibler Divergence Learning [0.0]
We propose ratio divergence (RD) learning for discrete energy-based models.
RD learning combines the strength of both forward and reverse Kullback-Leibler divergence (KLD) learning.
Numerical experiments demonstrate that RD learning significantly outperforms other learning methods.
arXiv Detail & Related papers (2024-09-12T01:01:55Z) - Relative Difficulty Distillation for Semantic Segmentation [54.76143187709987]
We propose a pixel-level KD paradigm for semantic segmentation named Relative Difficulty Distillation (RDD)
RDD allows the teacher network to provide effective guidance on learning focus without additional optimization goals.
Our research showcases that RDD can integrate with existing KD methods to improve their upper performance bound.
arXiv Detail & Related papers (2024-07-04T08:08:25Z) - Symmetric Reinforcement Learning Loss for Robust Learning on Diverse Tasks and Model Scales [13.818149654692863]
Reinforcement learning (RL) training is inherently unstable due to factors such as moving targets and high gradient variance.
In this work, we improve the stability of RL training by adapting the reverse cross entropy (RCE) from supervised learning for noisy data to define a symmetric RL loss.
arXiv Detail & Related papers (2024-05-27T19:28:33Z) - Efficient Deep Reinforcement Learning Requires Regulating Overfitting [91.88004732618381]
We show that high temporal-difference (TD) error on the validation set of transitions is the main culprit that severely affects the performance of deep RL algorithms.
We show that a simple online model selection method that targets the validation TD error is effective across state-based DMC and Gym tasks.
arXiv Detail & Related papers (2023-04-20T17:11:05Z) - Deep Metric Learning for Unsupervised Remote Sensing Change Detection [60.89777029184023]
Remote Sensing Change Detection (RS-CD) aims to detect relevant changes from Multi-Temporal Remote Sensing Images (MT-RSIs)
The performance of existing RS-CD methods is attributed to training on large annotated datasets.
This paper proposes an unsupervised CD method based on deep metric learning that can deal with both of these issues.
arXiv Detail & Related papers (2023-03-16T17:52:45Z) - How to Train Your DRAGON: Diverse Augmentation Towards Generalizable
Dense Retrieval [80.54532535622988]
We show that a generalizable dense retriever can be trained to achieve high accuracy in both supervised and zero-shot retrieval.
DRAGON, our dense retriever trained with diverse augmentation, is the first BERT-base-sized DR to achieve state-of-the-art effectiveness in both supervised and zero-shot evaluations.
arXiv Detail & Related papers (2023-02-15T03:53:26Z) - Multi-level Distance Regularization for Deep Metric Learning [20.178765779788492]
We propose a novel distance-based regularization method for deep metric learning called Multi-level Distance Regularization (MDR)
MDR explicitly disturbs a learning procedure by regularizing pairwise distances between embedding vectors into multiple levels.
By easily adopting our MDR, the previous approaches can be improved in performance and generalization ability.
arXiv Detail & Related papers (2021-02-08T14:16:07Z) - SelfVoxeLO: Self-supervised LiDAR Odometry with Voxel-based Deep Neural
Networks [81.64530401885476]
We propose a self-supervised LiDAR odometry method, dubbed SelfVoxeLO, to tackle these two difficulties.
Specifically, we propose a 3D convolution network to process the raw LiDAR data directly, which extracts features that better encode the 3D geometric patterns.
We evaluate our method's performances on two large-scale datasets, i.e., KITTI and Apollo-SouthBay.
arXiv Detail & Related papers (2020-10-19T09:23:39Z) - The Effect of Multi-step Methods on Overestimation in Deep Reinforcement
Learning [6.181642248900806]
Multi-step (also called n-step) methods in reinforcement learning have been shown to be more efficient than the 1-step method.
We show that both MDDPG and MMDDPG are significantly less affected by the overestimation problem than DDPG with 1-step backup.
We also discuss the advantages and disadvantages of different ways to do multi-step expansion in order to reduce approximation error.
arXiv Detail & Related papers (2020-06-23T01:35:54Z) - Channel Attention based Iterative Residual Learning for Depth Map
Super-Resolution [58.626803922196146]
We argue that DSR models trained on synthetic dataset are restrictive and not effective in dealing with real-world DSR tasks.
We make two contributions in tackling real-world degradation of different depth sensors.
We propose a new framework for real-world DSR, which consists of four modules.
arXiv Detail & Related papers (2020-06-02T09:12:23Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.