Steadily Learn to Drive with Virtual Memory
- URL: http://arxiv.org/abs/2102.08072v1
- Date: Tue, 16 Feb 2021 10:46:52 GMT
- Title: Steadily Learn to Drive with Virtual Memory
- Authors: Yuhang Zhang, Yao Mu, Yujie Yang, Yang Guan, Shengbo Eben Li, Qi Sun
and Jianyu Chen
- Abstract summary: This paper proposes an algorithm called Learn to drive with Virtual Memory (LVM) to overcome these problems.
LVM compresses the high-dimensional information into compact latent states and learns a latent dynamic model to summarize the agent's experience.
The effectiveness of LVM is demonstrated by an image-input autonomous driving task.
- Score: 11.67256846037979
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Reinforcement learning has shown great potential in developing high-level
autonomous driving. However, for high-dimensional tasks, current RL methods
suffer from low data efficiency and oscillation in the training process. This
paper proposes an algorithm called Learn to drive with Virtual Memory (LVM) to
overcome these problems. LVM compresses the high-dimensional information into
compact latent states and learns a latent dynamic model to summarize the
agent's experience. Various imagined latent trajectories are generated as
virtual memory by the latent dynamic model. The policy is learned by
propagating gradient through the learned latent model with the imagined latent
trajectories and thus leads to high data efficiency. Furthermore, a double
critic structure is designed to reduce the oscillation during the training
process. The effectiveness of LVM is demonstrated by an image-input autonomous
driving task, in which LVM outperforms the existing method in terms of data
efficiency, learning stability, and control performance.
Related papers
- Efficient Training of Large Vision Models via Advanced Automated Progressive Learning [96.71646528053651]
We present an advanced automated progressive learning (AutoProg) framework for efficient training of Large Vision Models (LVMs)
We introduce AutoProg-Zero, by enhancing the AutoProg framework with a novel zero-shot unfreezing schedule search.
Experiments show that AutoProg accelerates ViT pre-training by up to 1.85x on ImageNet and accelerates fine-tuning of diffusion models by up to 2.86x, with comparable or even higher performance.
arXiv Detail & Related papers (2024-09-06T16:24:24Z) - Traffic expertise meets residual RL: Knowledge-informed model-based residual reinforcement learning for CAV trajectory control [1.5361702135159845]
This paper introduces a knowledge-informed model-based residual reinforcement learning framework.
It integrates traffic expert knowledge into a virtual environment model, employing the Intelligent Driver Model (IDM) for basic dynamics and neural networks for residual dynamics.
We propose a novel strategy that combines traditional control methods with residual RL, facilitating efficient learning and policy optimization without the need to learn from scratch.
arXiv Detail & Related papers (2024-08-30T16:16:57Z) - Simplified Temporal Consistency Reinforcement Learning [19.814047499837084]
We show that a simple representation learning approach relying on a latent dynamics model trained by latent temporal consistency is sufficient for high-performance RL.
Our approach outperforms model-free methods by a large margin and matches model-based methods' sample efficiency while training 2.4 times faster.
arXiv Detail & Related papers (2023-06-15T19:37:43Z) - Model-Based Reinforcement Learning with Multi-Task Offline Pretraining [59.82457030180094]
We present a model-based RL method that learns to transfer potentially useful dynamics and action demonstrations from offline data to a novel task.
The main idea is to use the world models not only as simulators for behavior learning but also as tools to measure the task relevance.
We demonstrate the advantages of our approach compared with the state-of-the-art methods in Meta-World and DeepMind Control Suite.
arXiv Detail & Related papers (2023-06-06T02:24:41Z) - Gradient-Based Trajectory Optimization With Learned Dynamics [80.41791191022139]
We use machine learning techniques to learn a differentiable dynamics model of the system from data.
We show that a neural network can model highly nonlinear behaviors accurately for large time horizons.
In our hardware experiments, we demonstrate that our learned model can represent complex dynamics for both the Spot and Radio-controlled (RC) car.
arXiv Detail & Related papers (2022-04-09T22:07:34Z) - Towards Scale Consistent Monocular Visual Odometry by Learning from the
Virtual World [83.36195426897768]
We propose VRVO, a novel framework for retrieving the absolute scale from virtual data.
We first train a scale-aware disparity network using both monocular real images and stereo virtual data.
The resulting scale-consistent disparities are then integrated with a direct VO system.
arXiv Detail & Related papers (2022-03-11T01:51:54Z) - PlayVirtual: Augmenting Cycle-Consistent Virtual Trajectories for
Reinforcement Learning [84.30765628008207]
We propose a novel method, dubbed PlayVirtual, which augments cycle-consistent virtual trajectories to enhance the data efficiency for RL feature representation learning.
Our method outperforms the current state-of-the-art methods by a large margin on both benchmarks.
arXiv Detail & Related papers (2021-06-08T07:37:37Z) - Efficient Transformers in Reinforcement Learning using Actor-Learner
Distillation [91.05073136215886]
"Actor-Learner Distillation" transfers learning progress from a large capacity learner model to a small capacity actor model.
We demonstrate in several challenging memory environments that using Actor-Learner Distillation recovers the clear sample-efficiency gains of the transformer learner model.
arXiv Detail & Related papers (2021-04-04T17:56:34Z) - Learning hierarchical behavior and motion planning for autonomous
driving [32.78069835190924]
We introduce hierarchical behavior and motion planning (HBMP) to explicitly model the behavior in learning-based solution.
We transform HBMP problem by integrating a classical sampling-based motion planner.
In addition, we propose a sharable representation for input sensory data across simulation platforms and real-world environment.
arXiv Detail & Related papers (2020-05-08T05:34:55Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.