Related papers: Utilizing Skipped Frames in Action Repeats via Pseudo-Actions

Utilizing Skipped Frames in Action Repeats via Pseudo-Actions

URL: http://arxiv.org/abs/2105.03041v1
Date: Fri, 7 May 2021 02:43:44 GMT
Title: Utilizing Skipped Frames in Action Repeats via Pseudo-Actions
Authors: Taisei Hashimoto and Yoshimasa Tsuruoka
Abstract summary: In many deep reinforcement learning settings, when an agent takes an action, it repeats the same action a predefined number of times without observing the states until the next action-decision point. Since the amount of training data is inversely proportional to the interval of action repeats, they can have a negative impact on the sample efficiency of training. We propose a simple but effective approach to alleviate this problem by introducing the concept of pseudo-actions.
Score: 13.985534521589253
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: In many deep reinforcement learning settings, when an agent takes an action, it repeats the same action a predefined number of times without observing the states until the next action-decision point. This technique of action repetition has several merits in training the agent, but the data between action-decision points (i.e., intermediate frames) are, in effect, discarded. Since the amount of training data is inversely proportional to the interval of action repeats, they can have a negative impact on the sample efficiency of training. In this paper, we propose a simple but effective approach to alleviate to this problem by introducing the concept of pseudo-actions. The key idea of our method is making the transition between action-decision points usable as training data by considering pseudo-actions. Pseudo-actions for continuous control tasks are obtained as the average of the action sequence straddling an action-decision point. For discrete control tasks, pseudo-actions are computed from learned action embeddings. This method can be combined with any model-free reinforcement learning algorithm that involves the learning of Q-functions. We demonstrate the effectiveness of our approach on both continuous and discrete control tasks in OpenAI Gym.

Related papers

Select before Act: Spatially Decoupled Action Repetition for Continuous Control [8.39061976254379]
Reinforcement Learning (RL) has achieved remarkable success in various continuous control tasks, such as robot manipulation and locomotion. Recent studies have incorporated action repetition into RL, achieving enhanced action persistence with improved sample efficiency and superior performance. Existing methods treat all action dimensions as a whole during repetition, ignoring variations among them. We propose a novel repetition framework called SDAR, which implements closed-loop act-or-repeat selection for each action dimension individually.
arXiv Detail & Related papers (2025-02-10T16:07:28Z)
Reducing Action Space for Deep Reinforcement Learning via Causal Effect Estimation [15.684669299728743]
We propose a method to improve exploration efficiency by estimating the causal effects of actions. We first pre-train an inverse dynamics model to serve as prior knowledge of the environment. We classify actions across the entire action space at each time step and estimate the causal effect of each action to suppress redundant actions.
arXiv Detail & Related papers (2025-01-24T14:47:33Z)
Reinforcement Learning with Action Sequence for Data-Efficient Robot Learning [62.3886343725955]
We introduce a novel RL algorithm that learns a critic network that outputs Q-values over a sequence of actions. By explicitly training the value functions to learn the consequence of executing a series of current and future actions, our algorithm allows for learning useful value functions from noisy trajectories.
arXiv Detail & Related papers (2024-11-19T01:23:52Z)
Bidirectional Decoding: Improving Action Chunking via Closed-Loop Resampling [51.38330727868982]
Bidirectional Decoding (BID) is a test-time inference algorithm that bridges action chunking with closed-loop operations. We show that BID boosts the performance of two state-of-the-art generative policies across seven simulation benchmarks and two real-world tasks.
arXiv Detail & Related papers (2024-08-30T15:39:34Z)
Excluding the Irrelevant: Focusing Reinforcement Learning through Continuous Action Masking [7.590209768166108]
We introduce three continuous action masking methods for mapping the action space to the state-dependent set of relevant actions. Our methods ensure that only relevant actions are executed, enhancing the predictability of the RL agent and enabling its use in safety-critical applications.
arXiv Detail & Related papers (2024-06-06T02:55:16Z)
Unsupervised Learning of Effective Actions in Robotics [0.9374652839580183]
Current state-of-the-art action representations in robotics lack proper effect-driven learning of the robot's actions. We propose an unsupervised algorithm to discretize a continuous motion space and generate "action prototypes" We evaluate our method on a simulated stair-climbing reinforcement learning task.
arXiv Detail & Related papers (2024-04-03T13:28:52Z)
PRISE: LLM-Style Sequence Compression for Learning Temporal Action Abstractions in Control [55.81022882408587]
Temporal action abstractions, along with belief state representations, are a powerful knowledge sharing mechanism for sequential decision making. We propose a novel view that treats inducing temporal action abstractions as a sequence compression problem. We introduce an approach that combines continuous action quantization with byte pair encoding to learn powerful action abstractions.
arXiv Detail & Related papers (2024-02-16T04:55:09Z)
Dynamic Interval Restrictions on Action Spaces in Deep Reinforcement Learning for Obstacle Avoidance [0.0]
In this thesis, we consider the problem of interval restrictions as they occur in pathfinding with dynamic obstacles. Recent research learns with strong assumptions on the number of intervals, is limited to convex subsets. We propose two approaches that are independent of the state of the environment by extending parameterized reinforcement learning and ConstraintNet to handle an arbitrary number of intervals.
arXiv Detail & Related papers (2023-06-13T09:13:13Z)
ReAct: Temporal Action Detection with Relational Queries [84.76646044604055]
This work aims at advancing temporal action detection (TAD) using an encoder-decoder framework with action queries. We first propose a relational attention mechanism in the decoder, which guides the attention among queries based on their relations. Lastly, we propose to predict the localization quality of each action query at inference in order to distinguish high-quality queries.
arXiv Detail & Related papers (2022-07-14T17:46:37Z)
Few-shot Action Recognition with Prototype-centered Attentive Learning [88.10852114988829]
Prototype-centered Attentive Learning (PAL) model composed of two novel components. First, a prototype-centered contrastive learning loss is introduced to complement the conventional query-centered learning objective. Second, PAL integrates a attentive hybrid learning mechanism that can minimize the negative impacts of outliers.
arXiv Detail & Related papers (2021-01-20T11:48:12Z)
Discrete Action On-Policy Learning with Action-Value Critic [72.20609919995086]
Reinforcement learning (RL) in discrete action space is ubiquitous in real-world applications, but its complexity grows exponentially with the action-space dimension. We construct a critic to estimate action-value functions, apply it on correlated actions, and combine these critic estimated action values to control the variance of gradient estimation. These efforts result in a new discrete action on-policy RL algorithm that empirically outperforms related on-policy algorithms relying on variance control techniques.
arXiv Detail & Related papers (2020-02-10T04:23:09Z)

This list is automatically generated from the titles and abstracts of the papers in this site.