State-Novelty Guided Action Persistence in Deep Reinforcement Learning
- URL: http://arxiv.org/abs/2409.05433v1
- Date: Mon, 9 Sep 2024 08:34:22 GMT
- Title: State-Novelty Guided Action Persistence in Deep Reinforcement Learning
- Authors: Jianshu Hu, Paul Weng, Yutong Ban,
- Abstract summary: We propose a novel method to dynamically adjust the action persistence based on the current exploration status of the state space.
Our method can be seamlessly integrated into various basic exploration strategies to incorporate temporal persistence.
- Score: 7.05832012052375
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: While a powerful and promising approach, deep reinforcement learning (DRL) still suffers from sample inefficiency, which can be notably improved by resorting to more sophisticated techniques to address the exploration-exploitation dilemma. One such technique relies on action persistence (i.e., repeating an action over multiple steps). However, previous work exploiting action persistence either applies a fixed strategy or learns additional value functions (or policy) for selecting the repetition number. In this paper, we propose a novel method to dynamically adjust the action persistence based on the current exploration status of the state space. In such a way, our method does not require training of additional value functions or policy. Moreover, the use of a smooth scheduling of the repeat probability allows a more effective balance between exploration and exploitation. Furthermore, our method can be seamlessly integrated into various basic exploration strategies to incorporate temporal persistence. Finally, extensive experiments on different DMControl tasks demonstrate that our state-novelty guided action persistence method significantly improves the sample efficiency.
Related papers
- Learning Uncertainty-Aware Temporally-Extended Actions [22.901453123868674]
We propose a novel algorithm named Uncertainty-aware Temporal Extension (UTE)
UTE employs ensemble methods to accurately measure uncertainty during action extension.
We demonstrate the effectiveness of UTE through experiments in Gridworld and Atari 2600 environments.
arXiv Detail & Related papers (2024-02-08T06:32:06Z) - RLIF: Interactive Imitation Learning as Reinforcement Learning [56.997263135104504]
We show how off-policy reinforcement learning can enable improved performance under assumptions that are similar but potentially even more practical than those of interactive imitation learning.
Our proposed method uses reinforcement learning with user intervention signals themselves as rewards.
This relaxes the assumption that intervening experts in interactive imitation learning should be near-optimal and enables the algorithm to learn behaviors that improve over the potential suboptimal human expert.
arXiv Detail & Related papers (2023-11-21T21:05:21Z) - Simultaneously Updating All Persistence Values in Reinforcement Learning [40.10326490326968]
In reinforcement learning, the performance of learning agents is sensitive to the choice of time discretization.
In this work, we derive a novel All-Persistence Bellman Operator, which allows an effective use of both the low-persistence experience and the high-persistence experience.
arXiv Detail & Related papers (2022-11-21T16:22:57Z) - Deep Intrinsically Motivated Exploration in Continuous Control [0.0]
In continuous systems, exploration is often performed through undirected strategies in which parameters of the networks or selected actions are perturbed by random noise.
We adapt existing theories on animal motivational systems into the reinforcement learning paradigm and introduce a novel directed exploration strategy.
Our framework extends to larger and more diverse state spaces, dramatically improves the baselines, and outperforms the undirected strategies significantly.
arXiv Detail & Related papers (2022-10-01T14:52:16Z) - TempoRL: Temporal Priors for Exploration in Off-Policy Reinforcement
Learning [33.512849582347734]
We propose to learn features from offline data that are shared by a more diverse range of tasks.
We introduce state-independent temporal priors, which directly model temporal consistency in demonstrated trajectories.
We also introduce a novel integration scheme for action priors in off-policy reinforcement learning.
arXiv Detail & Related papers (2022-05-26T17:49:12Z) - Temporal Abstractions-Augmented Temporally Contrastive Learning: An
Alternative to the Laplacian in RL [140.12803111221206]
In reinforcement learning, the graph Laplacian has proved to be a valuable tool in the task-agnostic setting.
We propose an alternative method that is able to recover, in a non-uniform-prior setting, the expressiveness and the desired properties of the Laplacian representation.
We find that our method succeeds as an alternative to the Laplacian in the non-uniform setting and scales to challenging continuous control environments.
arXiv Detail & Related papers (2022-03-21T22:07:48Z) - Value-Based Reinforcement Learning for Continuous Control Robotic
Manipulation in Multi-Task Sparse Reward Settings [15.198729819644795]
We show the potential of value-based reinforcement learning for learning continuous robotic manipulation tasks in sparse reward settings.
On robotic manipulation tasks, we empirically show RBF-DQN converges faster than current state of the art algorithms such as TD3, SAC, and PPO.
We also perform ablation studies with RBF-DQN and have shown that some enhancement techniques for vanilla Deep Q learning such as Hindsight Experience Replay (HER) and Prioritized Experience Replay (PER) can also be applied to RBF-DQN.
arXiv Detail & Related papers (2021-07-28T13:40:08Z) - Meta-Reinforcement Learning Robust to Distributional Shift via Model
Identification and Experience Relabeling [126.69933134648541]
We present a meta-reinforcement learning algorithm that is both efficient and extrapolates well when faced with out-of-distribution tasks at test time.
Our method is based on a simple insight: we recognize that dynamics models can be adapted efficiently and consistently with off-policy data.
arXiv Detail & Related papers (2020-06-12T13:34:46Z) - Zeroth-Order Supervised Policy Improvement [94.0748002906652]
Policy gradient (PG) algorithms have been widely used in reinforcement learning (RL)
We propose Zeroth-Order Supervised Policy Improvement (ZOSPI)
ZOSPI exploits the estimated value function $Q$ globally while preserving the local exploitation of the PG methods.
arXiv Detail & Related papers (2020-06-11T16:49:23Z) - Discrete Action On-Policy Learning with Action-Value Critic [72.20609919995086]
Reinforcement learning (RL) in discrete action space is ubiquitous in real-world applications, but its complexity grows exponentially with the action-space dimension.
We construct a critic to estimate action-value functions, apply it on correlated actions, and combine these critic estimated action values to control the variance of gradient estimation.
These efforts result in a new discrete action on-policy RL algorithm that empirically outperforms related on-policy algorithms relying on variance control techniques.
arXiv Detail & Related papers (2020-02-10T04:23:09Z) - Reward-Conditioned Policies [100.64167842905069]
imitation learning requires near-optimal expert data.
Can we learn effective policies via supervised learning without demonstrations?
We show how such an approach can be derived as a principled method for policy search.
arXiv Detail & Related papers (2019-12-31T18:07:43Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.