Model-Based Imitation Learning for Urban Driving
- URL: http://arxiv.org/abs/2210.07729v1
- Date: Fri, 14 Oct 2022 11:59:46 GMT
- Title: Model-Based Imitation Learning for Urban Driving
- Authors: Anthony Hu and Gianluca Corrado and Nicolas Griffiths and Zak Murez
and Corina Gurau and Hudson Yeo and Alex Kendall and Roberto Cipolla and
Jamie Shotton
- Abstract summary: We present MILE: a Model-based Imitation LEarning approach to jointly learn a model of the world and a policy for autonomous driving.
Our model is trained on an offline corpus of urban driving data, without any online interaction with the environment.
Our approach is the first camera-only method that models static scene, dynamic scene, and ego-behaviour in an urban driving environment.
- Score: 26.782783239210087
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: An accurate model of the environment and the dynamic agents acting in it
offers great potential for improving motion planning. We present MILE: a
Model-based Imitation LEarning approach to jointly learn a model of the world
and a policy for autonomous driving. Our method leverages 3D geometry as an
inductive bias and learns a highly compact latent space directly from
high-resolution videos of expert demonstrations. Our model is trained on an
offline corpus of urban driving data, without any online interaction with the
environment. MILE improves upon prior state-of-the-art by 35% in driving score
on the CARLA simulator when deployed in a completely new town and new weather
conditions. Our model can predict diverse and plausible states and actions,
that can be interpretably decoded to bird's-eye view semantic segmentation.
Further, we demonstrate that it can execute complex driving manoeuvres from
plans entirely predicted in imagination. Our approach is the first camera-only
method that models static scene, dynamic scene, and ego-behaviour in an urban
driving environment. The code and model weights are available at
https://github.com/wayveai/mile.
Related papers
- DrivingDojo Dataset: Advancing Interactive and Knowledge-Enriched Driving World Model [65.43473733967038]
We introduce DrivingDojo, the first dataset tailor-made for training interactive world models with complex driving dynamics.
Our dataset features video clips with a complete set of driving maneuvers, diverse multi-agent interplay, and rich open-world driving knowledge.
arXiv Detail & Related papers (2024-10-14T17:19:23Z) - Solving Motion Planning Tasks with a Scalable Generative Model [15.858076912795621]
We present an efficient solution based on generative models which learns the dynamics of the driving scenes.
Our innovative design allows the model to operate in both full-Autoregressive and partial-Autoregressive modes.
We conclude that the proposed generative model may serve as a foundation for a variety of motion planning tasks.
arXiv Detail & Related papers (2024-07-03T03:57:05Z) - Urban Scene Diffusion through Semantic Occupancy Map [49.20779809250597]
UrbanDiffusion is a 3D diffusion model conditioned on a Bird's-Eye View (BEV) map.
Our model learns the data distribution of scene-level structures within a latent space.
After training on real-world driving datasets, our model can generate a wide range of diverse urban scenes.
arXiv Detail & Related papers (2024-03-18T11:54:35Z) - Street Gaussians: Modeling Dynamic Urban Scenes with Gaussian Splatting [32.59889755381453]
Recent methods extend NeRF by incorporating tracked vehicle poses to animate vehicles, enabling photo-realistic view of dynamic urban street scenes.
We introduce Street Gaussians, a new explicit scene representation that tackles these limitations.
The proposed method consistently outperforms state-of-the-art methods across all datasets.
arXiv Detail & Related papers (2024-01-02T18:59:55Z) - Neural World Models for Computer Vision [2.741266294612776]
We present a framework to train a world model and a policy, parameterised by deep neural networks.
We leverage important computer vision concepts such as geometry, semantics, and motion to scale world models to complex urban driving scenes.
Our model can jointly predict static scene, dynamic scene, and ego-behaviour in an urban driving environment.
arXiv Detail & Related papers (2023-06-15T14:58:21Z) - Video Killed the HD-Map: Predicting Multi-Agent Behavior Directly From
Aerial Images [14.689298253430568]
We propose an aerial image-based map (AIM) representation that requires minimal annotation and provides rich road context information for traffic agents like pedestrians and vehicles.
Our results demonstrate competitive multi-agent trajectory prediction performance especially for pedestrians in the scene when using our AIM representation.
arXiv Detail & Related papers (2023-05-19T17:48:01Z) - TrafficBots: Towards World Models for Autonomous Driving Simulation and
Motion Prediction [149.5716746789134]
We show data-driven traffic simulation can be formulated as a world model.
We present TrafficBots, a multi-agent policy built upon motion prediction and end-to-end driving.
Experiments on the open motion dataset show TrafficBots can simulate realistic multi-agent behaviors.
arXiv Detail & Related papers (2023-03-07T18:28:41Z) - Policy Pre-training for End-to-end Autonomous Driving via
Self-supervised Geometric Modeling [96.31941517446859]
We propose PPGeo (Policy Pre-training via Geometric modeling), an intuitive and straightforward fully self-supervised framework curated for the policy pretraining in visuomotor driving.
We aim at learning policy representations as a powerful abstraction by modeling 3D geometric scenes on large-scale unlabeled and uncalibrated YouTube driving videos.
In the first stage, the geometric modeling framework generates pose and depth predictions simultaneously, with two consecutive frames as input.
In the second stage, the visual encoder learns driving policy representation by predicting the future ego-motion and optimizing with the photometric error based on current visual observation only.
arXiv Detail & Related papers (2023-01-03T08:52:49Z) - End-to-end Interpretable Neural Motion Planner [78.69295676456085]
We propose a neural motion planner (NMP) for learning to drive autonomously in complex urban scenarios.
We design a holistic model that takes as input raw LIDAR data and a HD map and produces interpretable intermediate representations.
We demonstrate the effectiveness of our approach in real-world driving data captured in several cities in North America.
arXiv Detail & Related papers (2021-01-17T14:16:12Z) - TrafficSim: Learning to Simulate Realistic Multi-Agent Behaviors [74.67698916175614]
We propose TrafficSim, a multi-agent behavior model for realistic traffic simulation.
In particular, we leverage an implicit latent variable model to parameterize a joint actor policy.
We show TrafficSim generates significantly more realistic and diverse traffic scenarios as compared to a diverse set of baselines.
arXiv Detail & Related papers (2021-01-17T00:29:30Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.