WcDT: World-centric Diffusion Transformer for Traffic Scene Generation
- URL: http://arxiv.org/abs/2404.02082v3
- Date: Fri, 04 Oct 2024 01:10:29 GMT
- Title: WcDT: World-centric Diffusion Transformer for Traffic Scene Generation
- Authors: Chen Yang, Yangfan He, Aaron Xuxiang Tian, Dong Chen, Jianhui Wang, Tianyu Shi, Arsalan Heydarian,
- Abstract summary: We introduce a novel approach for autonomous driving trajectory generation by harnessing the complementary strengths of diffusion probabilistic models and transformers.
Our proposed framework, termed the "World-Centric Diffusion Transformer"(WcDT), optimize the entire trajectory generation process.
Our results show that the proposed approach exhibits superior performance in generating both realistic and diverse trajectories.
- Score: 13.616763172038846
- License:
- Abstract: In this paper, we introduce a novel approach for autonomous driving trajectory generation by harnessing the complementary strengths of diffusion probabilistic models (a.k.a., diffusion models) and transformers. Our proposed framework, termed the "World-Centric Diffusion Transformer"(WcDT), optimizes the entire trajectory generation process, from feature extraction to model inference. To enhance the scene diversity and stochasticity, the historical trajectory data is first preprocessed into "Agent Move Statement" and encoded into latent space using Denoising Diffusion Probabilistic Models (DDPM) enhanced with Diffusion with Transformer (DiT) blocks. Then, the latent features, historical trajectories, HD map features, and historical traffic signal information are fused with various transformer-based encoders that are used to enhance the interaction of agents with other elements in the traffic scene. The encoded traffic scenes are then decoded by a trajectory decoder to generate multimodal future trajectories. Comprehensive experimental results show that the proposed approach exhibits superior performance in generating both realistic and diverse trajectories, showing its potential for integration into automatic driving simulation systems. Our code is available at \url{https://github.com/yangchen1997/WcDT}.
Related papers
- DragTraffic: Interactive and Controllable Traffic Scene Generation for Autonomous Driving [10.90477019946728]
DragTraffic is a general, interactive, and controllable traffic scene generation framework based on conditional diffusion.
We employ a regression model to provide a general initial solution and a refinement process based on the conditional diffusion model to ensure diversity.
Experiments on a real-world driving dataset show that DragTraffic outperforms existing methods in terms of authenticity, diversity, and freedom.
arXiv Detail & Related papers (2024-04-19T04:49:28Z) - SceneDM: Scene-level Multi-agent Trajectory Generation with Consistent
Diffusion Models [10.057312592344507]
We propose a novel framework based on diffusion models, called SceneDM, to generate joint and consistent future motions of all the agents in a scene.
SceneDM achieves state-of-the-art results on the Sim Agents Benchmark.
arXiv Detail & Related papers (2023-11-27T11:39:27Z) - A Diffusion-Model of Joint Interactive Navigation [14.689298253430568]
We present DJINN - a diffusion based method of generating traffic scenarios.
Our approach jointly diffuses the trajectories of all agents, conditioned on a flexible set of state observations from the past, present, or future.
We show how DJINN flexibly enables direct test-time sampling from a variety of valuable conditional distributions.
arXiv Detail & Related papers (2023-09-21T22:10:20Z) - Complexity Matters: Rethinking the Latent Space for Generative Modeling [65.64763873078114]
In generative modeling, numerous successful approaches leverage a low-dimensional latent space, e.g., Stable Diffusion.
In this study, we aim to shed light on this under-explored topic by rethinking the latent space from the perspective of model complexity.
arXiv Detail & Related papers (2023-07-17T07:12:29Z) - Stochastic Trajectory Prediction via Motion Indeterminacy Diffusion [88.45326906116165]
We present a new framework to formulate the trajectory prediction task as a reverse process of motion indeterminacy diffusion (MID)
We encode the history behavior information and the social interactions as a state embedding and devise a Transformer-based diffusion model to capture the temporal dependencies of trajectories.
Experiments on the human trajectory prediction benchmarks including the Stanford Drone and ETH/UCY datasets demonstrate the superiority of our method.
arXiv Detail & Related papers (2022-03-25T16:59:08Z) - Domain Generalization for Vision-based Driving Trajectory Generation [9.490923738117772]
We propose a domain generalization method for vision-based driving trajectory generation for autonomous vehicles in urban environments.
We leverage an adversarial learning approach to train a trajectory generator as the decoder.
We compare our proposed method with the state-of-the-art trajectory generation method and some recent domain generalization methods on both datasets and simulation.
arXiv Detail & Related papers (2021-09-22T07:49:07Z) - PnP-DETR: Towards Efficient Visual Analysis with Transformers [146.55679348493587]
Recently, DETR pioneered the solution vision tasks with transformers, it directly translates the image feature map into the object result.
Recent transformer-based image recognition model andTT show consistent efficiency gain.
arXiv Detail & Related papers (2021-09-15T01:10:30Z) - Multi-Modal Fusion Transformer for End-to-End Autonomous Driving [59.60483620730437]
We propose TransFuser, a novel Multi-Modal Fusion Transformer, to integrate image and LiDAR representations using attention.
Our approach achieves state-of-the-art driving performance while reducing collisions by 76% compared to geometry-based fusion.
arXiv Detail & Related papers (2021-04-19T11:48:13Z) - TransMOT: Spatial-Temporal Graph Transformer for Multiple Object
Tracking [74.82415271960315]
We propose a solution named TransMOT to efficiently model the spatial and temporal interactions among objects in a video.
TransMOT is not only more computationally efficient than the traditional Transformer, but it also achieves better tracking accuracy.
The proposed method is evaluated on multiple benchmark datasets including MOT15, MOT16, MOT17, and MOT20.
arXiv Detail & Related papers (2021-04-01T01:49:05Z) - Spatial-Channel Transformer Network for Trajectory Prediction on the
Traffic Scenes [2.7955111755177695]
We present a Spatial-Channel Transformer Network for trajectory prediction with attention functions.
A channel-wise module is inserted to measure the social interaction between agents.
We find that the network achieves promising results on real-world trajectory prediction datasets on the traffic scenes.
arXiv Detail & Related papers (2021-01-27T15:03:42Z) - Haar Wavelet based Block Autoregressive Flows for Trajectories [129.37479472754083]
Prediction of trajectories such as that of pedestrians is crucial to the performance of autonomous agents.
We introduce a novel Haar wavelet based block autoregressive model leveraging split couplings.
We illustrate the advantages of our approach for generating diverse and accurate trajectories on two real-world datasets.
arXiv Detail & Related papers (2020-09-21T13:57:10Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.