Utilizing Skipped Frames in Action Repeats via Pseudo-Actions
- URL: http://arxiv.org/abs/2105.03041v1
- Date: Fri, 7 May 2021 02:43:44 GMT
- Title: Utilizing Skipped Frames in Action Repeats via Pseudo-Actions
- Authors: Taisei Hashimoto and Yoshimasa Tsuruoka
- Abstract summary: In many deep reinforcement learning settings, when an agent takes an action, it repeats the same action a predefined number of times without observing the states until the next action-decision point.
Since the amount of training data is inversely proportional to the interval of action repeats, they can have a negative impact on the sample efficiency of training.
We propose a simple but effective approach to alleviate this problem by introducing the concept of pseudo-actions.
- Score: 13.985534521589253
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: In many deep reinforcement learning settings, when an agent takes an action,
it repeats the same action a predefined number of times without observing the
states until the next action-decision point. This technique of action
repetition has several merits in training the agent, but the data between
action-decision points (i.e., intermediate frames) are, in effect, discarded.
Since the amount of training data is inversely proportional to the interval of
action repeats, they can have a negative impact on the sample efficiency of
training. In this paper, we propose a simple but effective approach to
alleviate to this problem by introducing the concept of pseudo-actions. The key
idea of our method is making the transition between action-decision points
usable as training data by considering pseudo-actions. Pseudo-actions for
continuous control tasks are obtained as the average of the action sequence
straddling an action-decision point. For discrete control tasks, pseudo-actions
are computed from learned action embeddings. This method can be combined with
any model-free reinforcement learning algorithm that involves the learning of
Q-functions. We demonstrate the effectiveness of our approach on both
continuous and discrete control tasks in OpenAI Gym.
Related papers
- Purify Unlearnable Examples via Rate-Constrained Variational Autoencoders [101.42201747763178]
Unlearnable examples (UEs) seek to maximize testing error by making subtle modifications to training examples that are correctly labeled.
Our work provides a novel disentanglement mechanism to build an efficient pre-training purification method.
arXiv Detail & Related papers (2024-05-02T16:49:25Z) - Unsupervised Learning of Effective Actions in Robotics [0.9374652839580183]
Current state-of-the-art action representations in robotics lack proper effect-driven learning of the robot's actions.
We propose an unsupervised algorithm to discretize a continuous motion space and generate "action prototypes"
We evaluate our method on a simulated stair-climbing reinforcement learning task.
arXiv Detail & Related papers (2024-04-03T13:28:52Z) - PRISE: LLM-Style Sequence Compression for Learning Temporal Action Abstractions in Control [55.81022882408587]
Temporal action abstractions, along with belief state representations, are a powerful knowledge sharing mechanism for sequential decision making.
We propose a novel view that treats inducing temporal action abstractions as a sequence compression problem.
We introduce an approach that combines continuous action quantization with byte pair encoding to learn powerful action abstractions.
arXiv Detail & Related papers (2024-02-16T04:55:09Z) - Dynamic Interval Restrictions on Action Spaces in Deep Reinforcement
Learning for Obstacle Avoidance [0.0]
In this thesis, we consider the problem of interval restrictions as they occur in pathfinding with dynamic obstacles.
Recent research learns with strong assumptions on the number of intervals, is limited to convex subsets.
We propose two approaches that are independent of the state of the environment by extending parameterized reinforcement learning and ConstraintNet to handle an arbitrary number of intervals.
arXiv Detail & Related papers (2023-06-13T09:13:13Z) - Trajectory-Aware Eligibility Traces for Off-Policy Reinforcement
Learning [44.50394347326546]
Off-policy learning from multistep returns is crucial for sample-efficient reinforcement learning.
Off-policy bias is corrected in a per-decision manner, but once a trace has been fully cut, the effect cannot be reversed.
We propose a multistep operator that can express both per-decision and trajectory-aware methods.
arXiv Detail & Related papers (2023-01-26T18:57:41Z) - ReAct: Temporal Action Detection with Relational Queries [84.76646044604055]
This work aims at advancing temporal action detection (TAD) using an encoder-decoder framework with action queries.
We first propose a relational attention mechanism in the decoder, which guides the attention among queries based on their relations.
Lastly, we propose to predict the localization quality of each action query at inference in order to distinguish high-quality queries.
arXiv Detail & Related papers (2022-07-14T17:46:37Z) - Training and Evaluation of Deep Policies using Reinforcement Learning
and Generative Models [67.78935378952146]
GenRL is a framework for solving sequential decision-making problems.
It exploits the combination of reinforcement learning and latent variable generative models.
We experimentally determine the characteristics of generative models that have most influence on the performance of the final policy training.
arXiv Detail & Related papers (2022-04-18T22:02:32Z) - Goal-conditioned dual-action imitation learning for dexterous dual-arm robot manipulation [4.717749411286867]
Long-horizon dexterous robot manipulation of deformable objects, such as banana peeling, is a problematic task.
This paper presents a goal-conditioned dual-action deep imitation learning (DIL) approach that can learn dexterous manipulation skills.
arXiv Detail & Related papers (2022-03-18T05:17:00Z) - Learning Routines for Effective Off-Policy Reinforcement Learning [0.0]
We propose a novel framework for reinforcement learning that effectively lifts such constraints.
Within our framework, agents learn effective behavior over a routine space.
We show that the resulting agents obtain relevant performance improvements while requiring fewer interactions with the environment per episode.
arXiv Detail & Related papers (2021-06-05T18:41:57Z) - Few-shot Action Recognition with Prototype-centered Attentive Learning [88.10852114988829]
Prototype-centered Attentive Learning (PAL) model composed of two novel components.
First, a prototype-centered contrastive learning loss is introduced to complement the conventional query-centered learning objective.
Second, PAL integrates a attentive hybrid learning mechanism that can minimize the negative impacts of outliers.
arXiv Detail & Related papers (2021-01-20T11:48:12Z) - Discrete Action On-Policy Learning with Action-Value Critic [72.20609919995086]
Reinforcement learning (RL) in discrete action space is ubiquitous in real-world applications, but its complexity grows exponentially with the action-space dimension.
We construct a critic to estimate action-value functions, apply it on correlated actions, and combine these critic estimated action values to control the variance of gradient estimation.
These efforts result in a new discrete action on-policy RL algorithm that empirically outperforms related on-policy algorithms relying on variance control techniques.
arXiv Detail & Related papers (2020-02-10T04:23:09Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.