Related papers: TransDiffuser: End-to-end Trajectory Generation with Decorrelated Multi-modal Representation for Autonomous Driving

TransDiffuser: End-to-end Trajectory Generation with Decorrelated Multi-modal Representation for Autonomous Driving

URL: http://arxiv.org/abs/2505.09315v1
Date: Wed, 14 May 2025 12:10:41 GMT
Title: TransDiffuser: End-to-end Trajectory Generation with Decorrelated Multi-modal Representation for Autonomous Driving
Authors: Xuefeng Jiang, Yuan Ma, Pengxiang Li, Leimeng Xu, Xin Wen, Kun Zhan, Zhongpu Xia, Peng Jia, XianPeng Lang, Sheng Sun,
Abstract summary: We propose TransDiffuser, an encoder-decoder based generative trajectory planning model for end-to-end autonomous driving.<n>TransDiffuser achieves PDMS of 94.85 on the NAVSIM benchmark, surpassing previous state-of-the-art methods without any anchor-based prior trajectories.
Score: 16.338107803841257
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: In recent years, diffusion model has shown its potential across diverse domains from vision generation to language modeling. Transferring its capabilities to modern autonomous driving systems has also emerged as a promising direction.In this work, we propose TransDiffuser, an encoder-decoder based generative trajectory planning model for end-to-end autonomous driving. The encoded scene information serves as the multi-modal conditional input of the denoising decoder. To tackle the mode collapse dilemma in generating high-quality diverse trajectories, we introduce a simple yet effective multi-modal representation decorrelation optimization mechanism during the training process.TransDiffuser achieves PDMS of 94.85 on the NAVSIM benchmark, surpassing previous state-of-the-art methods without any anchor-based prior trajectories.

Related papers

Finetuning Generative Trajectory Model with Reinforcement Learning from Human Feedback [33.09982089166203]
We introduce TrajHF, a human feedback-driven finetuning framework for generative trajectory models.<n>TrajHF refines multi-modal trajectory generation beyond conventional imitation learning.<n>It achieves PDMS of 93.95 on NavSim benchmark, significantly exceeding other methods.
arXiv Detail & Related papers (2025-03-13T14:56:17Z)
Advancing Semantic Future Prediction through Multimodal Visual Sequence Transformers [11.075247758198762]
This paper introduces FUTURIST, a method for multimodal future semantic prediction that uses a unified and efficient visual sequence transformer architecture.<n>We propose a VAE-free hierarchical tokenization process, which reduces computational complexity, streamlines the training pipeline, and enables end-to-end training with high-resolution, multimodal inputs.<n>We validate FUTURIST on the Cityscapes dataset, demonstrating state-of-the-art performance in future semantic segmentation for both short- and mid-term forecasting.
arXiv Detail & Related papers (2025-01-14T18:34:14Z)
DiFSD: Ego-Centric Fully Sparse Paradigm with Uncertainty Denoising and Iterative Refinement for Efficient End-to-End Self-Driving [55.53171248839489]
We propose an ego-centric fully sparse paradigm, named DiFSD, for end-to-end self-driving.<n>Specifically, DiFSD mainly consists of sparse perception, hierarchical interaction and iterative motion planner.<n>Experiments conducted on nuScenes and Bench2Drive datasets demonstrate the superior planning performance and great efficiency of DiFSD.
arXiv Detail & Related papers (2024-09-15T15:55:24Z)
WcDT: World-centric Diffusion Transformer for Traffic Scene Generation [14.236973526112674]
We introduce a novel approach for autonomous driving trajectory generation by harnessing the complementary strengths of diffusion probabilistic models and transformers.<n>Our proposed framework, termed the "World-Centric Diffusion Transformer"(WcDT), optimize the entire trajectory generation process.<n>Our results show that the proposed approach exhibits superior performance in generating both realistic and diverse trajectories.
arXiv Detail & Related papers (2024-04-02T16:28:41Z)
Trajeglish: Traffic Modeling as Next-Token Prediction [67.28197954427638]
A longstanding challenge for self-driving development is simulating dynamic driving scenarios seeded from recorded driving logs. We apply tools from discrete sequence modeling to model how vehicles, pedestrians and cyclists interact in driving scenarios. Our model tops the Sim Agents Benchmark, surpassing prior work along the realism meta metric by 3.3% and along the interaction metric by 9.9%.
arXiv Detail & Related papers (2023-12-07T18:53:27Z)
Drive Anywhere: Generalizable End-to-end Autonomous Driving with Multi-modal Foundation Models [114.69732301904419]
We present an approach to apply end-to-end open-set (any environment/scene) autonomous driving that is capable of providing driving decisions from representations queryable by image and text. Our approach demonstrates unparalleled results in diverse tests while achieving significantly greater robustness in out-of-distribution situations.
arXiv Detail & Related papers (2023-10-26T17:56:35Z)
MotionLM: Multi-Agent Motion Forecasting as Language Modeling [15.317827804763699]
We present MotionLM, a language model for multi-agent motion prediction. Our approach bypasses post-hoc interactions where individual agent trajectory generation is conducted prior to interactive scoring. The model's sequential factorization enables temporally causal conditional rollouts.
arXiv Detail & Related papers (2023-09-28T15:46:25Z)
Unified Discrete Diffusion for Simultaneous Vision-Language Generation [78.21352271140472]
We present a unified multimodal generation model that can conduct both the "modality translation" and "multi-modality generation" tasks. Specifically, we unify the discrete diffusion process for multimodal signals by proposing a unified transition matrix. Our proposed method can perform comparably to the state-of-the-art solutions in various generation tasks.
arXiv Detail & Related papers (2022-11-27T14:46:01Z)
Domain Generalization for Vision-based Driving Trajectory Generation [9.490923738117772]
We propose a domain generalization method for vision-based driving trajectory generation for autonomous vehicles in urban environments. We leverage an adversarial learning approach to train a trajectory generator as the decoder. We compare our proposed method with the state-of-the-art trajectory generation method and some recent domain generalization methods on both datasets and simulation.
arXiv Detail & Related papers (2021-09-22T07:49:07Z)
A Driving Behavior Recognition Model with Bi-LSTM and Multi-Scale CNN [59.57221522897815]
We propose a neural network model based on trajectories information for driving behavior recognition. We evaluate the proposed model on the public BLVD dataset, achieving a satisfying performance.
arXiv Detail & Related papers (2021-03-01T06:47:29Z)
Haar Wavelet based Block Autoregressive Flows for Trajectories [129.37479472754083]
Prediction of trajectories such as that of pedestrians is crucial to the performance of autonomous agents. We introduce a novel Haar wavelet based block autoregressive model leveraging split couplings. We illustrate the advantages of our approach for generating diverse and accurate trajectories on two real-world datasets.
arXiv Detail & Related papers (2020-09-21T13:57:10Z)

This list is automatically generated from the titles and abstracts of the papers in this site.