PlayVirtual: Augmenting Cycle-Consistent Virtual Trajectories for
Reinforcement Learning
- URL: http://arxiv.org/abs/2106.04152v1
- Date: Tue, 8 Jun 2021 07:37:37 GMT
- Title: PlayVirtual: Augmenting Cycle-Consistent Virtual Trajectories for
Reinforcement Learning
- Authors: Tao Yu, Cuiling Lan, Wenjun Zeng, Mingxiao Feng, Zhibo Chen
- Abstract summary: We propose a novel method, dubbed PlayVirtual, which augments cycle-consistent virtual trajectories to enhance the data efficiency for RL feature representation learning.
Our method outperforms the current state-of-the-art methods by a large margin on both benchmarks.
- Score: 84.30765628008207
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Learning good feature representations is important for deep reinforcement
learning (RL). However, with limited experience, RL often suffers from data
inefficiency for training. For un-experienced or less-experienced trajectories
(i.e., state-action sequences), the lack of data limits the use of them for
better feature learning. In this work, we propose a novel method, dubbed
PlayVirtual, which augments cycle-consistent virtual trajectories to enhance
the data efficiency for RL feature representation learning. Specifically,
PlayVirtual predicts future states based on the current state and action by a
dynamics model and then predicts the previous states by a backward dynamics
model, which forms a trajectory cycle. Based on this, we augment the actions to
generate a large amount of virtual state-action trajectories. Being free of
groudtruth state supervision, we enforce a trajectory to meet the cycle
consistency constraint, which can significantly enhance the data efficiency. We
validate the effectiveness of our designs on the Atari and DeepMind Control
Suite benchmarks. Our method outperforms the current state-of-the-art methods
by a large margin on both benchmarks.
Related papers
- Action-Quantized Offline Reinforcement Learning for Robotic Skill
Learning [68.16998247593209]
offline reinforcement learning (RL) paradigm provides recipe to convert static behavior datasets into policies that can perform better than the policy that collected the data.
In this paper, we propose an adaptive scheme for action quantization.
We show that several state-of-the-art offline RL methods such as IQL, CQL, and BRAC improve in performance on benchmarks when combined with our proposed discretization scheme.
arXiv Detail & Related papers (2023-10-18T06:07:10Z) - Reasoning with Latent Diffusion in Offline Reinforcement Learning [11.349356866928547]
offline reinforcement learning holds promise as a means to learn high-reward policies from a static dataset.
Key challenge in offline RL lies in effectively stitching portions of suboptimal trajectories from the static dataset.
We propose a novel approach that leverages the expressiveness of latent diffusion to model in-support trajectory sequences as compressed latent skills.
arXiv Detail & Related papers (2023-09-12T20:58:21Z) - Model-Based Reinforcement Learning with Multi-Task Offline Pretraining [59.82457030180094]
We present a model-based RL method that learns to transfer potentially useful dynamics and action demonstrations from offline data to a novel task.
The main idea is to use the world models not only as simulators for behavior learning but also as tools to measure the task relevance.
We demonstrate the advantages of our approach compared with the state-of-the-art methods in Meta-World and DeepMind Control Suite.
arXiv Detail & Related papers (2023-06-06T02:24:41Z) - MSTFormer: Motion Inspired Spatial-temporal Transformer with
Dynamic-aware Attention for long-term Vessel Trajectory Prediction [0.6451914896767135]
MSTFormer is a motion inspired vessel trajectory prediction method based on Transformer.
We propose a data augmentation method to describe the spatial features and motion features of the trajectory.
Second, we propose a Multi-headed Dynamic-aware Self-attention mechanism to focus on trajectory points with frequent motion transformations.
Third, we construct a knowledge-inspired loss function to further boost the performance of the model.
arXiv Detail & Related papers (2023-03-21T02:11:37Z) - Knowing the Past to Predict the Future: Reinforcement Virtual Learning [29.47688292868217]
Reinforcement Learning (RL)-based control system has received considerable attention in recent decades.
In this paper, we present a cost-efficient framework, such that the RL model can evolve for itself in a Virtual Space.
The proposed framework enables a step-by-step RL model to predict the future state and select optimal actions for long-sight decisions.
arXiv Detail & Related papers (2022-11-02T16:48:14Z) - Value-Consistent Representation Learning for Data-Efficient
Reinforcement Learning [105.70602423944148]
We propose a novel method, called value-consistent representation learning (VCR), to learn representations that are directly related to decision-making.
Instead of aligning this imagined state with a real state returned by the environment, VCR applies a $Q$-value head on both states and obtains two distributions of action values.
It has been demonstrated that our methods achieve new state-of-the-art performance for search-free RL algorithms.
arXiv Detail & Related papers (2022-06-25T03:02:25Z) - TRAIL: Near-Optimal Imitation Learning with Suboptimal Data [100.83688818427915]
We present training objectives that use offline datasets to learn a factored transition model.
Our theoretical analysis shows that the learned latent action space can boost the sample-efficiency of downstream imitation learning.
To learn the latent action space in practice, we propose TRAIL (Transition-Reparametrized Actions for Imitation Learning), an algorithm that learns an energy-based transition model.
arXiv Detail & Related papers (2021-10-27T21:05:00Z) - Steadily Learn to Drive with Virtual Memory [11.67256846037979]
This paper proposes an algorithm called Learn to drive with Virtual Memory (LVM) to overcome these problems.
LVM compresses the high-dimensional information into compact latent states and learns a latent dynamic model to summarize the agent's experience.
The effectiveness of LVM is demonstrated by an image-input autonomous driving task.
arXiv Detail & Related papers (2021-02-16T10:46:52Z) - Offline Reinforcement Learning from Images with Latent Space Models [60.69745540036375]
offline reinforcement learning (RL) refers to the problem of learning policies from a static dataset of environment interactions.
We build on recent advances in model-based algorithms for offline RL, and extend them to high-dimensional visual observation spaces.
Our approach is both tractable in practice and corresponds to maximizing a lower bound of the ELBO in the unknown POMDP.
arXiv Detail & Related papers (2020-12-21T18:28:17Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.