Panacea: Panoramic and Controllable Video Generation for Autonomous
Driving
- URL: http://arxiv.org/abs/2311.16813v1
- Date: Tue, 28 Nov 2023 14:22:24 GMT
- Title: Panacea: Panoramic and Controllable Video Generation for Autonomous
Driving
- Authors: Yuqing Wen, Yucheng Zhao, Yingfei Liu, Fan Jia, Yanhui Wang, Chong
Luo, Chi Zhang, Tiancai Wang, Xiaoyan Sun, Xiangyu Zhang
- Abstract summary: We propose Panacea, an innovative approach to generate panoramic and controllable videos in driving scenarios.
Panacea addresses two critical challenges: 'Consistency' and 'Controllability'
- Score: 38.404935454784855
- License: http://creativecommons.org/licenses/by-nc-nd/4.0/
- Abstract: The field of autonomous driving increasingly demands high-quality annotated
training data. In this paper, we propose Panacea, an innovative approach to
generate panoramic and controllable videos in driving scenarios, capable of
yielding an unlimited numbers of diverse, annotated samples pivotal for
autonomous driving advancements. Panacea addresses two critical challenges:
'Consistency' and 'Controllability.' Consistency ensures temporal and
cross-view coherence, while Controllability ensures the alignment of generated
content with corresponding annotations. Our approach integrates a novel 4D
attention and a two-stage generation pipeline to maintain coherence,
supplemented by the ControlNet framework for meticulous control by the
Bird's-Eye-View (BEV) layouts. Extensive qualitative and quantitative
evaluations of Panacea on the nuScenes dataset prove its effectiveness in
generating high-quality multi-view driving-scene videos. This work notably
propels the field of autonomous driving by effectively augmenting the training
dataset used for advanced BEV perception techniques.
Related papers
- MagicDriveDiT: High-Resolution Long Video Generation for Autonomous Driving with Adaptive Control [68.74166535159311]
We introduce MagicDriveDiT, a novel approach based on the DiT architecture.
By incorporating spatial-temporal conditional encoding, MagicDriveDiT achieves precise control over spatial-temporal latents.
Experiments show its superior performance in generating realistic street scene videos with higher resolution and more frames.
arXiv Detail & Related papers (2024-11-21T03:13:30Z) - ZOPP: A Framework of Zero-shot Offboard Panoptic Perception for Autonomous Driving [44.174489160967056]
Offboard perception aims to automatically generate high-quality 3D labels for autonomous driving scenes.
We propose a novel Zero-shot Offboard Panoptic Perception (ZOPP) framework for autonomous driving scenes.
ZOPP integrates the powerful zero-shot recognition capabilities of vision foundation models and 3D representations derived from point clouds.
arXiv Detail & Related papers (2024-11-08T03:52:32Z) - XAI-based Feature Ensemble for Enhanced Anomaly Detection in Autonomous Driving Systems [1.3022753212679383]
This paper proposes a novel feature ensemble framework that integrates multiple Explainable AI (XAI) methods.
By fusing top features identified by these XAI methods across six diverse AI models, the framework creates a robust and comprehensive set of features critical for detecting anomalies.
Our technique demonstrates improved accuracy, robustness, and transparency of AI models, contributing to safer and more trustworthy autonomous driving systems.
arXiv Detail & Related papers (2024-10-20T14:34:48Z) - DiVE: DiT-based Video Generation with Enhanced Control [23.63288169762629]
We propose first DiT-based framework specifically designed for generating temporally and multi-view consistent videos.
Specifically, the proposed framework leverages a parameter-free spatial view-inflated attention mechanism to guarantee the cross-view consistency.
arXiv Detail & Related papers (2024-09-03T04:29:59Z) - Panacea+: Panoramic and Controllable Video Generation for Autonomous Driving [23.63374916271247]
We propose Panacea+, a powerful framework for generating video data in driving scenes.
Panacea+ adopts a multi-view appearance noise prior mechanism and a super-resolution module for enhanced consistency and increased resolution.
Experiments show that the generated video samples greatly benefit a wide range of tasks on different datasets.
arXiv Detail & Related papers (2024-08-14T15:10:13Z) - Enhancing End-to-End Autonomous Driving with Latent World Model [78.22157677787239]
We propose a novel self-supervised method to enhance end-to-end driving without the need for costly labels.
Our framework textbfLAW uses a LAtent World model to predict future latent features based on the predicted ego actions and the latent feature of the current frame.
As a result, our approach achieves state-of-the-art performance in both open-loop and closed-loop benchmarks without costly annotations.
arXiv Detail & Related papers (2024-06-12T17:59:21Z) - 3D Object Visibility Prediction in Autonomous Driving [6.802572869909114]
We present a novel attribute and its corresponding algorithm: 3D object visibility.
Our proposal of this attribute and its computational strategy aims to expand the capabilities for downstream tasks.
arXiv Detail & Related papers (2024-03-06T13:07:42Z) - Driving into the Future: Multiview Visual Forecasting and Planning with
World Model for Autonomous Driving [56.381918362410175]
Drive-WM is the first driving world model compatible with existing end-to-end planning models.
Our model generates high-fidelity multiview videos in driving scenes.
arXiv Detail & Related papers (2023-11-29T18:59:47Z) - Drive Anywhere: Generalizable End-to-end Autonomous Driving with
Multi-modal Foundation Models [114.69732301904419]
We present an approach to apply end-to-end open-set (any environment/scene) autonomous driving that is capable of providing driving decisions from representations queryable by image and text.
Our approach demonstrates unparalleled results in diverse tests while achieving significantly greater robustness in out-of-distribution situations.
arXiv Detail & Related papers (2023-10-26T17:56:35Z) - Visual Exemplar Driven Task-Prompting for Unified Perception in
Autonomous Driving [100.3848723827869]
We present an effective multi-task framework, VE-Prompt, which introduces visual exemplars via task-specific prompting.
Specifically, we generate visual exemplars based on bounding boxes and color-based markers, which provide accurate visual appearances of target categories.
We bridge transformer-based encoders and convolutional layers for efficient and accurate unified perception in autonomous driving.
arXiv Detail & Related papers (2023-03-03T08:54:06Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.