Learning Uncertainty-Aware Temporally-Extended Actions
- URL: http://arxiv.org/abs/2402.05439v1
- Date: Thu, 8 Feb 2024 06:32:06 GMT
- Title: Learning Uncertainty-Aware Temporally-Extended Actions
- Authors: Joongkyu Lee, Seung Joon Park, Yunhao Tang, Min-hwan Oh
- Abstract summary: We propose a novel algorithm named Uncertainty-aware Temporal Extension (UTE)
UTE employs ensemble methods to accurately measure uncertainty during action extension.
We demonstrate the effectiveness of UTE through experiments in Gridworld and Atari 2600 environments.
- Score: 22.901453123868674
- License: http://creativecommons.org/licenses/by-nc-nd/4.0/
- Abstract: In reinforcement learning, temporal abstraction in the action space,
exemplified by action repetition, is a technique to facilitate policy learning
through extended actions. However, a primary limitation in previous studies of
action repetition is its potential to degrade performance, particularly when
sub-optimal actions are repeated. This issue often negates the advantages of
action repetition. To address this, we propose a novel algorithm named
Uncertainty-aware Temporal Extension (UTE). UTE employs ensemble methods to
accurately measure uncertainty during action extension. This feature allows
policies to strategically choose between emphasizing exploration or adopting an
uncertainty-averse approach, tailored to their specific needs. We demonstrate
the effectiveness of UTE through experiments in Gridworld and Atari 2600
environments. Our findings show that UTE outperforms existing action repetition
algorithms, effectively mitigating their inherent limitations and significantly
enhancing policy learning efficiency.
Related papers
- Active Fine-Tuning of Generalist Policies [54.65568433408307]
We propose AMF (Active Multi-task Fine-tuning) to maximize multi-task policy performance under a limited demonstration budget.
We derive performance guarantees for AMF under regularity assumptions and demonstrate its empirical effectiveness in complex and high-dimensional environments.
arXiv Detail & Related papers (2024-10-07T13:26:36Z) - State-Novelty Guided Action Persistence in Deep Reinforcement Learning [7.05832012052375]
We propose a novel method to dynamically adjust the action persistence based on the current exploration status of the state space.
Our method can be seamlessly integrated into various basic exploration strategies to incorporate temporal persistence.
arXiv Detail & Related papers (2024-09-09T08:34:22Z) - Excluding the Irrelevant: Focusing Reinforcement Learning through Continuous Action Masking [7.590209768166108]
We introduce three continuous action masking methods for mapping the action space to the state-dependent set of relevant actions.
Our methods ensure that only relevant actions are executed, enhancing the predictability of the RL agent and enabling its use in safety-critical applications.
arXiv Detail & Related papers (2024-06-06T02:55:16Z) - ACE : Off-Policy Actor-Critic with Causality-Aware Entropy Regularization [52.5587113539404]
We introduce a causality-aware entropy term that effectively identifies and prioritizes actions with high potential impacts for efficient exploration.
Our proposed algorithm, ACE: Off-policy Actor-critic with Causality-aware Entropy regularization, demonstrates a substantial performance advantage across 29 diverse continuous control tasks.
arXiv Detail & Related papers (2024-02-22T13:22:06Z) - Soft Action Priors: Towards Robust Policy Transfer [9.860944032009847]
We use the action prior from the Reinforcement Learning as Inference framework to recover state-of-the-art policy distillation techniques.
Then, we propose a class of adaptive methods that can robustly exploit action priors by combining reward shaping and auxiliary regularization losses.
We show that the proposed methods achieve state-of-the-art performance, surpassing it when learning from suboptimal priors.
arXiv Detail & Related papers (2022-09-20T17:36:28Z) - Safe and Robust Experience Sharing for Deterministic Policy Gradient
Algorithms [0.0]
We introduce a simple yet effective experience sharing mechanism for deterministic policies in continuous action domains.
We facilitate our algorithm with a novel off-policy correction technique without any action probability estimates.
We test the effectiveness of our method in challenging OpenAI Gym continuous control tasks and conclude that it can achieve a safe experience sharing across multiple agents.
arXiv Detail & Related papers (2022-07-27T11:10:50Z) - Imitating, Fast and Slow: Robust learning from demonstrations via
decision-time planning [96.72185761508668]
Planning at Test-time (IMPLANT) is a new meta-algorithm for imitation learning.
We demonstrate that IMPLANT significantly outperforms benchmark imitation learning approaches on standard control environments.
arXiv Detail & Related papers (2022-04-07T17:16:52Z) - Deterministic and Discriminative Imitation (D2-Imitation): Revisiting
Adversarial Imitation for Sample Efficiency [61.03922379081648]
We propose an off-policy sample efficient approach that requires no adversarial training or min-max optimization.
Our empirical results show that D2-Imitation is effective in achieving good sample efficiency, outperforming several off-policy extension approaches of adversarial imitation.
arXiv Detail & Related papers (2021-12-11T19:36:19Z) - Reinforcement Learning With Sparse-Executing Actions via Sparsity Regularization [15.945378631406024]
Reinforcement learning (RL) has demonstrated impressive performance in decision-making tasks like embodied control, autonomous driving and financial trading.
In many decision-making tasks, the agents often encounter the problem of executing actions under limited budgets.
This paper formalizes the problem as a Sparse Action Markov Decision Process (SA-MDP), in which specific actions in the action space can only be executed for a limited time.
We propose a policy optimization algorithm, Action Sparsity REgularization (ASRE), which adaptively handles each action with a distinct preference.
arXiv Detail & Related papers (2021-05-18T16:50:42Z) - Reannealing of Decaying Exploration Based On Heuristic Measure in Deep
Q-Network [82.20059754270302]
We propose an algorithm based on the idea of reannealing, that aims at encouraging exploration only when it is needed.
We perform an illustrative case study showing that it has potential to both accelerate training and obtain a better policy.
arXiv Detail & Related papers (2020-09-29T20:40:00Z) - Discrete Action On-Policy Learning with Action-Value Critic [72.20609919995086]
Reinforcement learning (RL) in discrete action space is ubiquitous in real-world applications, but its complexity grows exponentially with the action-space dimension.
We construct a critic to estimate action-value functions, apply it on correlated actions, and combine these critic estimated action values to control the variance of gradient estimation.
These efforts result in a new discrete action on-policy RL algorithm that empirically outperforms related on-policy algorithms relying on variance control techniques.
arXiv Detail & Related papers (2020-02-10T04:23:09Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.