Panacea: Panoramic and Controllable Video Generation for Autonomous
Driving
- URL: http://arxiv.org/abs/2311.16813v1
- Date: Tue, 28 Nov 2023 14:22:24 GMT
- Title: Panacea: Panoramic and Controllable Video Generation for Autonomous
Driving
- Authors: Yuqing Wen, Yucheng Zhao, Yingfei Liu, Fan Jia, Yanhui Wang, Chong
Luo, Chi Zhang, Tiancai Wang, Xiaoyan Sun, Xiangyu Zhang
- Abstract summary: We propose Panacea, an innovative approach to generate panoramic and controllable videos in driving scenarios.
Panacea addresses two critical challenges: 'Consistency' and 'Controllability'
- Score: 38.404935454784855
- License: http://creativecommons.org/licenses/by-nc-nd/4.0/
- Abstract: The field of autonomous driving increasingly demands high-quality annotated
training data. In this paper, we propose Panacea, an innovative approach to
generate panoramic and controllable videos in driving scenarios, capable of
yielding an unlimited numbers of diverse, annotated samples pivotal for
autonomous driving advancements. Panacea addresses two critical challenges:
'Consistency' and 'Controllability.' Consistency ensures temporal and
cross-view coherence, while Controllability ensures the alignment of generated
content with corresponding annotations. Our approach integrates a novel 4D
attention and a two-stage generation pipeline to maintain coherence,
supplemented by the ControlNet framework for meticulous control by the
Bird's-Eye-View (BEV) layouts. Extensive qualitative and quantitative
evaluations of Panacea on the nuScenes dataset prove its effectiveness in
generating high-quality multi-view driving-scene videos. This work notably
propels the field of autonomous driving by effectively augmenting the training
dataset used for advanced BEV perception techniques.
Related papers
- XAI-based Feature Ensemble for Enhanced Anomaly Detection in Autonomous Driving Systems [1.3022753212679383]
This paper proposes a novel feature ensemble framework that integrates multiple Explainable AI (XAI) methods.
By fusing top features identified by these XAI methods across six diverse AI models, the framework creates a robust and comprehensive set of features critical for detecting anomalies.
Our technique demonstrates improved accuracy, robustness, and transparency of AI models, contributing to safer and more trustworthy autonomous driving systems.
arXiv Detail & Related papers (2024-10-20T14:34:48Z) - DiVE: DiT-based Video Generation with Enhanced Control [23.63288169762629]
We propose first DiT-based framework specifically designed for generating temporally and multi-view consistent videos.
Specifically, the proposed framework leverages a parameter-free spatial view-inflated attention mechanism to guarantee the cross-view consistency.
arXiv Detail & Related papers (2024-09-03T04:29:59Z) - Panacea+: Panoramic and Controllable Video Generation for Autonomous Driving [23.63374916271247]
We propose Panacea+, a powerful framework for generating video data in driving scenes.
Panacea+ adopts a multi-view appearance noise prior mechanism and a super-resolution module for enhanced consistency and increased resolution.
Experiments show that the generated video samples greatly benefit a wide range of tasks on different datasets.
arXiv Detail & Related papers (2024-08-14T15:10:13Z) - Enhancing End-to-End Autonomous Driving with Latent World Model [78.22157677787239]
We propose a novel self-supervised method to enhance end-to-end driving without the need for costly labels.
Our framework textbfLAW uses a LAtent World model to predict future latent features based on the predicted ego actions and the latent feature of the current frame.
As a result, our approach achieves state-of-the-art performance in both open-loop and closed-loop benchmarks without costly annotations.
arXiv Detail & Related papers (2024-06-12T17:59:21Z) - 3D Object Visibility Prediction in Autonomous Driving [6.802572869909114]
We present a novel attribute and its corresponding algorithm: 3D object visibility.
Our proposal of this attribute and its computational strategy aims to expand the capabilities for downstream tasks.
arXiv Detail & Related papers (2024-03-06T13:07:42Z) - Driving into the Future: Multiview Visual Forecasting and Planning with
World Model for Autonomous Driving [56.381918362410175]
Drive-WM is the first driving world model compatible with existing end-to-end planning models.
Our model generates high-fidelity multiview videos in driving scenes.
arXiv Detail & Related papers (2023-11-29T18:59:47Z) - Drive Anywhere: Generalizable End-to-end Autonomous Driving with
Multi-modal Foundation Models [114.69732301904419]
We present an approach to apply end-to-end open-set (any environment/scene) autonomous driving that is capable of providing driving decisions from representations queryable by image and text.
Our approach demonstrates unparalleled results in diverse tests while achieving significantly greater robustness in out-of-distribution situations.
arXiv Detail & Related papers (2023-10-26T17:56:35Z) - End-to-end Autonomous Driving: Challenges and Frontiers [45.391430626264764]
We provide a comprehensive analysis of more than 270 papers, covering the motivation, roadmap, methodology, challenges, and future trends in end-to-end autonomous driving.
We delve into several critical challenges, including multi-modality, interpretability, causal confusion, robustness, and world models, amongst others.
We discuss current advancements in foundation models and visual pre-training, as well as how to incorporate these techniques within the end-to-end driving framework.
arXiv Detail & Related papers (2023-06-29T14:17:24Z) - Visual Exemplar Driven Task-Prompting for Unified Perception in
Autonomous Driving [100.3848723827869]
We present an effective multi-task framework, VE-Prompt, which introduces visual exemplars via task-specific prompting.
Specifically, we generate visual exemplars based on bounding boxes and color-based markers, which provide accurate visual appearances of target categories.
We bridge transformer-based encoders and convolutional layers for efficient and accurate unified perception in autonomous driving.
arXiv Detail & Related papers (2023-03-03T08:54:06Z) - Policy Pre-training for End-to-end Autonomous Driving via
Self-supervised Geometric Modeling [96.31941517446859]
We propose PPGeo (Policy Pre-training via Geometric modeling), an intuitive and straightforward fully self-supervised framework curated for the policy pretraining in visuomotor driving.
We aim at learning policy representations as a powerful abstraction by modeling 3D geometric scenes on large-scale unlabeled and uncalibrated YouTube driving videos.
In the first stage, the geometric modeling framework generates pose and depth predictions simultaneously, with two consecutive frames as input.
In the second stage, the visual encoder learns driving policy representation by predicting the future ego-motion and optimizing with the photometric error based on current visual observation only.
arXiv Detail & Related papers (2023-01-03T08:52:49Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.