Utilizing Skipped Frames in Action Repeats via Pseudo-Actions
- URL: http://arxiv.org/abs/2105.03041v1
- Date: Fri, 7 May 2021 02:43:44 GMT
- Title: Utilizing Skipped Frames in Action Repeats via Pseudo-Actions
- Authors: Taisei Hashimoto and Yoshimasa Tsuruoka
- Abstract summary: In many deep reinforcement learning settings, when an agent takes an action, it repeats the same action a predefined number of times without observing the states until the next action-decision point.
Since the amount of training data is inversely proportional to the interval of action repeats, they can have a negative impact on the sample efficiency of training.
We propose a simple but effective approach to alleviate this problem by introducing the concept of pseudo-actions.
- Score: 13.985534521589253
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: In many deep reinforcement learning settings, when an agent takes an action,
it repeats the same action a predefined number of times without observing the
states until the next action-decision point. This technique of action
repetition has several merits in training the agent, but the data between
action-decision points (i.e., intermediate frames) are, in effect, discarded.
Since the amount of training data is inversely proportional to the interval of
action repeats, they can have a negative impact on the sample efficiency of
training. In this paper, we propose a simple but effective approach to
alleviate to this problem by introducing the concept of pseudo-actions. The key
idea of our method is making the transition between action-decision points
usable as training data by considering pseudo-actions. Pseudo-actions for
continuous control tasks are obtained as the average of the action sequence
straddling an action-decision point. For discrete control tasks, pseudo-actions
are computed from learned action embeddings. This method can be combined with
any model-free reinforcement learning algorithm that involves the learning of
Q-functions. We demonstrate the effectiveness of our approach on both
continuous and discrete control tasks in OpenAI Gym.
Related papers
- Select before Act: Spatially Decoupled Action Repetition for Continuous Control [8.39061976254379]
Reinforcement Learning (RL) has achieved remarkable success in various continuous control tasks, such as robot manipulation and locomotion.
Recent studies have incorporated action repetition into RL, achieving enhanced action persistence with improved sample efficiency and superior performance.
Existing methods treat all action dimensions as a whole during repetition, ignoring variations among them.
We propose a novel repetition framework called SDAR, which implements closed-loop act-or-repeat selection for each action dimension individually.
arXiv Detail & Related papers (2025-02-10T16:07:28Z) - Reducing Action Space for Deep Reinforcement Learning via Causal Effect Estimation [15.684669299728743]
We propose a method to improve exploration efficiency by estimating the causal effects of actions.
We first pre-train an inverse dynamics model to serve as prior knowledge of the environment.
We classify actions across the entire action space at each time step and estimate the causal effect of each action to suppress redundant actions.
arXiv Detail & Related papers (2025-01-24T14:47:33Z) - Coarse-to-fine Q-Network with Action Sequence for Data-Efficient Robot Learning [62.3886343725955]
We introduce Coarse-to-fine Q-Network with Action Sequence (CQN-AS), a novel value-based reinforcement learning algorithm.
We study our algorithm on 53 robotic tasks with sparse and dense rewards, as well as with and without demonstrations.
arXiv Detail & Related papers (2024-11-19T01:23:52Z) - Bidirectional Decoding: Improving Action Chunking via Closed-Loop Resampling [51.38330727868982]
Bidirectional Decoding (BID) is a test-time inference algorithm that bridges action chunking with closed-loop operations.
We show that BID boosts the performance of two state-of-the-art generative policies across seven simulation benchmarks and two real-world tasks.
arXiv Detail & Related papers (2024-08-30T15:39:34Z) - Unsupervised Learning of Effective Actions in Robotics [0.9374652839580183]
Current state-of-the-art action representations in robotics lack proper effect-driven learning of the robot's actions.
We propose an unsupervised algorithm to discretize a continuous motion space and generate "action prototypes"
We evaluate our method on a simulated stair-climbing reinforcement learning task.
arXiv Detail & Related papers (2024-04-03T13:28:52Z) - PRISE: LLM-Style Sequence Compression for Learning Temporal Action Abstractions in Control [55.81022882408587]
Temporal action abstractions, along with belief state representations, are a powerful knowledge sharing mechanism for sequential decision making.
We propose a novel view that treats inducing temporal action abstractions as a sequence compression problem.
We introduce an approach that combines continuous action quantization with byte pair encoding to learn powerful action abstractions.
arXiv Detail & Related papers (2024-02-16T04:55:09Z) - Dynamic Interval Restrictions on Action Spaces in Deep Reinforcement
Learning for Obstacle Avoidance [0.0]
In this thesis, we consider the problem of interval restrictions as they occur in pathfinding with dynamic obstacles.
Recent research learns with strong assumptions on the number of intervals, is limited to convex subsets.
We propose two approaches that are independent of the state of the environment by extending parameterized reinforcement learning and ConstraintNet to handle an arbitrary number of intervals.
arXiv Detail & Related papers (2023-06-13T09:13:13Z) - ReAct: Temporal Action Detection with Relational Queries [84.76646044604055]
This work aims at advancing temporal action detection (TAD) using an encoder-decoder framework with action queries.
We first propose a relational attention mechanism in the decoder, which guides the attention among queries based on their relations.
Lastly, we propose to predict the localization quality of each action query at inference in order to distinguish high-quality queries.
arXiv Detail & Related papers (2022-07-14T17:46:37Z) - Few-shot Action Recognition with Prototype-centered Attentive Learning [88.10852114988829]
Prototype-centered Attentive Learning (PAL) model composed of two novel components.
First, a prototype-centered contrastive learning loss is introduced to complement the conventional query-centered learning objective.
Second, PAL integrates a attentive hybrid learning mechanism that can minimize the negative impacts of outliers.
arXiv Detail & Related papers (2021-01-20T11:48:12Z) - Discrete Action On-Policy Learning with Action-Value Critic [72.20609919995086]
Reinforcement learning (RL) in discrete action space is ubiquitous in real-world applications, but its complexity grows exponentially with the action-space dimension.
We construct a critic to estimate action-value functions, apply it on correlated actions, and combine these critic estimated action values to control the variance of gradient estimation.
These efforts result in a new discrete action on-policy RL algorithm that empirically outperforms related on-policy algorithms relying on variance control techniques.
arXiv Detail & Related papers (2020-02-10T04:23:09Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.