Value Activation for Bias Alleviation: Generalized-activated Deep Double
Deterministic Policy Gradients
- URL: http://arxiv.org/abs/2112.11216v1
- Date: Tue, 21 Dec 2021 13:45:40 GMT
- Title: Value Activation for Bias Alleviation: Generalized-activated Deep Double
Deterministic Policy Gradients
- Authors: Jiafei Lyu and Yu Yang and Jiangpeng Yan and Xiu Li
- Abstract summary: It is vital to accurately estimate the value function in Deep Reinforcement Learning (DRL)
Existing actor-critic methods suffer more or less from underestimation bias or overestimation bias.
We propose a generalized-activated weighting operator that uses any non-decreasing function, namely activation function, as weights for better value estimation.
- Score: 11.545991873249564
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: It is vital to accurately estimate the value function in Deep Reinforcement
Learning (DRL) such that the agent could execute proper actions instead of
suboptimal ones. However, existing actor-critic methods suffer more or less
from underestimation bias or overestimation bias, which negatively affect their
performance. In this paper, we reveal a simple but effective principle: proper
value correction benefits bias alleviation, where we propose the
generalized-activated weighting operator that uses any non-decreasing function,
namely activation function, as weights for better value estimation.
Particularly, we integrate the generalized-activated weighting operator into
value estimation and introduce a novel algorithm, Generalized-activated Deep
Double Deterministic Policy Gradients (GD3). We theoretically show that GD3 is
capable of alleviating the potential estimation bias. We interestingly find
that simple activation functions lead to satisfying performance with no
additional tricks, and could contribute to faster convergence. Experimental
results on numerous challenging continuous control tasks show that GD3 with
task-specific activation outperforms the common baseline methods. We also
uncover a fact that fine-tuning the polynomial activation function achieves
superior results on most of the tasks.
Related papers
- ACE : Off-Policy Actor-Critic with Causality-Aware Entropy Regularization [52.5587113539404]
We introduce a causality-aware entropy term that effectively identifies and prioritizes actions with high potential impacts for efficient exploration.
Our proposed algorithm, ACE: Off-policy Actor-critic with Causality-aware Entropy regularization, demonstrates a substantial performance advantage across 29 diverse continuous control tasks.
arXiv Detail & Related papers (2024-02-22T13:22:06Z) - DrM: Mastering Visual Reinforcement Learning through Dormant Ratio
Minimization [43.60484692738197]
Visual reinforcement learning has shown promise in continuous control tasks.
Current algorithms are still unsatisfactory in virtually every aspect of the performance.
DrM is the first model-free algorithm that consistently solves tasks in both the Dog and Manipulator domains.
arXiv Detail & Related papers (2023-10-30T15:50:56Z) - Ladder-of-Thought: Using Knowledge as Steps to Elevate Stance Detection [73.31406286956535]
We introduce the Ladder-of-Thought (LoT) for the stance detection task.
LoT directs the small LMs to assimilate high-quality external knowledge, refining the intermediate rationales produced.
Our empirical evaluations underscore LoT's efficacy, marking a 16% improvement over GPT-3.5 and a 10% enhancement compared to GPT-3.5 with CoT on stance detection task.
arXiv Detail & Related papers (2023-08-31T14:31:48Z) - ReLU to the Rescue: Improve Your On-Policy Actor-Critic with Positive Advantages [37.12048108122337]
This paper proposes a step toward approximate Bayesian inference in on-policy actor-critic deep reinforcement learning.
It is implemented through three changes to the Asynchronous Advantage Actor-Critic (A3C) algorithm.
arXiv Detail & Related papers (2023-06-02T11:37:22Z) - Benign Overfitting in Deep Neural Networks under Lazy Training [72.28294823115502]
We show that when the data distribution is well-separated, DNNs can achieve Bayes-optimal test error for classification.
Our results indicate that interpolating with smoother functions leads to better generalization.
arXiv Detail & Related papers (2023-05-30T19:37:44Z) - Data-aware customization of activation functions reduces neural network
error [0.35172332086962865]
We show that data-aware customization of activation functions can result in striking reductions in neural network error.
A simple substitution with the seagull'' activation function in an already-refined neural network can lead to an order-of-magnitude reduction in error.
arXiv Detail & Related papers (2023-01-16T23:38:37Z) - Efficient Neural Network Analysis with Sum-of-Infeasibilities [64.31536828511021]
Inspired by sum-of-infeasibilities methods in convex optimization, we propose a novel procedure for analyzing verification queries on networks with extensive branching functions.
An extension to a canonical case-analysis-based complete search procedure can be achieved by replacing the convex procedure executed at each search state with DeepSoI.
arXiv Detail & Related papers (2022-03-19T15:05:09Z) - Provable Benefits of Actor-Critic Methods for Offline Reinforcement
Learning [85.50033812217254]
Actor-critic methods are widely used in offline reinforcement learning practice, but are not so well-understood theoretically.
We propose a new offline actor-critic algorithm that naturally incorporates the pessimism principle.
arXiv Detail & Related papers (2021-08-19T17:27:29Z) - Softmax Deep Double Deterministic Policy Gradients [37.23518654230526]
We propose to use the Boltzmann softmax operator for value function estimation in continuous control.
We also design two new algorithms, Softmax Deep Deterministic Policy Gradients (SD2) and Softmax Deep Double Deterministic Policy Gradients (SD3), by building the softmax operator upon single and double estimators.
arXiv Detail & Related papers (2020-10-19T02:52:00Z) - WD3: Taming the Estimation Bias in Deep Reinforcement Learning [7.29018671106362]
We show that TD3 algorithm introduces underestimation bias in mild assumptions.
We propose a novel algorithm underlineWeighted underlineDelayed underlineDeep underlineDeterministic Policy Gradient (WD3), which can eliminate the estimation bias.
arXiv Detail & Related papers (2020-06-18T01:28:07Z) - Discrete Action On-Policy Learning with Action-Value Critic [72.20609919995086]
Reinforcement learning (RL) in discrete action space is ubiquitous in real-world applications, but its complexity grows exponentially with the action-space dimension.
We construct a critic to estimate action-value functions, apply it on correlated actions, and combine these critic estimated action values to control the variance of gradient estimation.
These efforts result in a new discrete action on-policy RL algorithm that empirically outperforms related on-policy algorithms relying on variance control techniques.
arXiv Detail & Related papers (2020-02-10T04:23:09Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.