REValueD: Regularised Ensemble Value-Decomposition for Factorisable
Markov Decision Processes
- URL: http://arxiv.org/abs/2401.08850v2
- Date: Fri, 8 Mar 2024 10:06:53 GMT
- Title: REValueD: Regularised Ensemble Value-Decomposition for Factorisable
Markov Decision Processes
- Authors: David Ireland and Giovanni Montana
- Abstract summary: Discrete-action reinforcement learning algorithms often falter in tasks with high-dimensional discrete action spaces.
This study delves deep into the effects of value-decomposition, revealing that it amplifies target variance.
We introduce a regularisation loss that helps to mitigate the effects that exploratory actions in one dimension can have on the value of optimal actions in other dimensions.
Our novel algorithm, REValueD, tested on discretised versions of the DeepMind Control Suite tasks, showcases superior performance.
- Score: 7.2129390689756185
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Discrete-action reinforcement learning algorithms often falter in tasks with
high-dimensional discrete action spaces due to the vast number of possible
actions. A recent advancement leverages value-decomposition, a concept from
multi-agent reinforcement learning, to tackle this challenge. This study delves
deep into the effects of this value-decomposition, revealing that whilst it
curtails the over-estimation bias inherent to Q-learning algorithms, it
amplifies target variance. To counteract this, we present an ensemble of
critics to mitigate target variance. Moreover, we introduce a regularisation
loss that helps to mitigate the effects that exploratory actions in one
dimension can have on the value of optimal actions in other dimensions. Our
novel algorithm, REValueD, tested on discretised versions of the DeepMind
Control Suite tasks, showcases superior performance, especially in the
challenging humanoid and dog tasks. We further dissect the factors influencing
REValueD's performance, evaluating the significance of the regularisation loss
and the scalability of REValueD with increasing sub-actions per dimension.
Related papers
- Feature Attenuation of Defective Representation Can Resolve Incomplete Masking on Anomaly Detection [1.0358639819750703]
In unsupervised anomaly detection (UAD) research, it is necessary to develop a computationally efficient and scalable solution.
We revisit the reconstruction-by-inpainting approach and rethink to improve it by analyzing strengths and weaknesses.
We propose Feature Attenuation of Defective Representation (FADeR) that only employs two layers which attenuates feature information of anomaly reconstruction.
arXiv Detail & Related papers (2024-07-05T15:44:53Z) - Growing Q-Networks: Solving Continuous Control Tasks with Adaptive Control Resolution [51.83951489847344]
In robotics applications, smooth control signals are commonly preferred to reduce system wear and energy efficiency.
In this work, we aim to bridge this performance gap by growing discrete action spaces from coarse to fine control resolution.
Our work indicates that an adaptive control resolution in combination with value decomposition yields simple critic-only algorithms that yield surprisingly strong performance on continuous control tasks.
arXiv Detail & Related papers (2024-04-05T17:58:37Z) - ACE : Off-Policy Actor-Critic with Causality-Aware Entropy Regularization [52.5587113539404]
We introduce a causality-aware entropy term that effectively identifies and prioritizes actions with high potential impacts for efficient exploration.
Our proposed algorithm, ACE: Off-policy Actor-critic with Causality-aware Entropy regularization, demonstrates a substantial performance advantage across 29 diverse continuous control tasks.
arXiv Detail & Related papers (2024-02-22T13:22:06Z) - Towards Safe Reinforcement Learning via Constraining Conditional
Value-at-Risk [30.229387511344456]
We propose a novel reinforcement learning algorithm of CVaR-Proximal-Policy-Optimization (CPPO) which formalizes the risk-sensitive constrained optimization problem by keeping its CVaR under a given threshold.
Experimental results show that CPPO achieves a higher cumulative reward and is more robust against both observation and transition disturbances.
arXiv Detail & Related papers (2022-06-09T11:57:54Z) - Residual Error: a New Performance Measure for Adversarial Robustness [85.0371352689919]
A major challenge that limits the wide-spread adoption of deep learning has been their fragility to adversarial attacks.
This study presents the concept of residual error, a new performance measure for assessing the adversarial robustness of a deep neural network.
Experimental results using the case of image classification demonstrate the effectiveness and efficacy of the proposed residual error metric.
arXiv Detail & Related papers (2021-06-18T16:34:23Z) - Regressive Domain Adaptation for Unsupervised Keypoint Detection [67.2950306888855]
Domain adaptation (DA) aims at transferring knowledge from a labeled source domain to an unlabeled target domain.
We present a method of regressive domain adaptation (RegDA) for unsupervised keypoint detection.
Our method brings large improvement by 8% to 11% in terms of PCK on different datasets.
arXiv Detail & Related papers (2021-03-10T16:45:22Z) - Reparameterized Variational Divergence Minimization for Stable Imitation [57.06909373038396]
We study the extent to which variations in the choice of probabilistic divergence may yield more performant ILO algorithms.
We contribute a re parameterization trick for adversarial imitation learning to alleviate the challenges of the promising $f$-divergence minimization framework.
Empirically, we demonstrate that our design choices allow for ILO algorithms that outperform baseline approaches and more closely match expert performance in low-dimensional continuous-control tasks.
arXiv Detail & Related papers (2020-06-18T19:04:09Z) - Hierarchical and Efficient Learning for Person Re-Identification [19.172946887940874]
We propose a novel Hierarchical and Efficient Network (HENet) that learns hierarchical global, partial, and recovery features ensemble under the supervision of multiple loss combinations.
We also propose a new dataset augmentation approach, dubbed Random Polygon Erasing (RPE), to random erase irregular area of the input image for imitating the body part missing.
arXiv Detail & Related papers (2020-05-18T15:45:25Z) - Discrete Action On-Policy Learning with Action-Value Critic [72.20609919995086]
Reinforcement learning (RL) in discrete action space is ubiquitous in real-world applications, but its complexity grows exponentially with the action-space dimension.
We construct a critic to estimate action-value functions, apply it on correlated actions, and combine these critic estimated action values to control the variance of gradient estimation.
These efforts result in a new discrete action on-policy RL algorithm that empirically outperforms related on-policy algorithms relying on variance control techniques.
arXiv Detail & Related papers (2020-02-10T04:23:09Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.