Related papers: Efficient Virtuoso: A Latent Diffusion Transformer Model for Goal-Conditioned Trajectory Planning

Efficient Virtuoso: A Latent Diffusion Transformer Model for Goal-Conditioned Trajectory Planning

URL: http://arxiv.org/abs/2509.03658v2
Date: Sat, 06 Sep 2025 15:10:15 GMT
Title: Efficient Virtuoso: A Latent Diffusion Transformer Model for Goal-Conditioned Trajectory Planning
Authors: Antonio Guillen-Perez,
Abstract summary: We present the Efficient Virtuoso, a conditional latent diffusion model for goal-conditioned trajectory planning.<n>We demonstrate that our method achieves state-of-the-art performance on the Open Motion dataset, achieving a minimum Average Displacement Error (minADE) of 0.25.<n>We provide a key insight: while a single goal can resolve strategic ambiguity, a richer, multi-step sparse route is essential for enabling the precise, high-fidelity tactical execution that mirrors nuanced human driving behavior.
Score: 0.0
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: The ability to generate a diverse and plausible distribution of future trajectories is a critical capability for autonomous vehicle planning systems. While recent generative models have shown promise, achieving high fidelity, computational efficiency, and precise control remains a significant challenge. In this paper, we present the Efficient Virtuoso, a conditional latent diffusion model for goal-conditioned trajectory planning. Our approach introduces a novel two-stage normalization pipeline that first scales trajectories to preserve their geometric aspect ratio and then normalizes the resulting PCA latent space to ensure a stable training target. The denoising process is performed efficiently in this low-dimensional latent space by a simple MLP denoiser, which is conditioned on a rich scene context fused by a powerful Transformer-based StateEncoder. We demonstrate that our method achieves state-of-the-art performance on the Waymo Open Motion Dataset, achieving a minimum Average Displacement Error (minADE) of 0.25. Furthermore, through a rigorous ablation study on goal representation, we provide a key insight: while a single endpoint goal can resolve strategic ambiguity, a richer, multi-step sparse route is essential for enabling the precise, high-fidelity tactical execution that mirrors nuanced human driving behavior.

Related papers

Self-Correcting VLA: Online Action Refinement via Sparse World Imagination [55.982504915794514]
We propose Self-Correcting VLA (SC-VLA), which achieve self-improvement by intrinsically guiding action refinement through sparse imagination.<n>SC-VLA achieve state-of-the-art performance, yielding the highest task throughput with 16% fewer steps and a 9% higher success rate than the best-performing baselines.
arXiv Detail & Related papers (2026-02-25T06:58:06Z)
Unifying Model-Free Efficiency and Model-Based Representations via Latent Dynamics [6.208369829942616]
We present Unified Latent Dynamics (ULD), a novel reinforcement learning algorithm.<n>ULD unifies the efficiency of model-free methods with the representational strengths of model-based approaches.<n> evaluated on 80 environments spanning Gym locomotion, DeepMind Control (proprioceptive and visual), and Atari.
arXiv Detail & Related papers (2026-02-13T06:06:56Z)
Dual-End Consistency Model [41.982957134224904]
Slow iterative sampling is a major bottleneck for the practical deployment of diffusion and flow-based generative models.<n>We propose a Dual-End Consistency Model (DE-CM) that selects vital sub-trajectory clusters to achieve stable and effective training.<n>Our method achieves a state-of-the-art FID score of 1.70 in one-step generation on the ImageNet 256x256 dataset, outperforming existing CM-based one-step approaches.
arXiv Detail & Related papers (2026-02-11T11:51:01Z)
Scaling Up Occupancy-centric Driving Scene Generation: Dataset and Method [54.461213497603154]
Occupancy-centric methods have recently achieved state-of-the-art results by offering consistent conditioning across frames and modalities.<n>Nuplan-Occ is the largest occupancy dataset to date, constructed from the widely used Nuplan benchmark.<n>We develop a unified framework that jointly synthesizes high-quality occupancy, multi-view videos, and LiDAR point clouds.
arXiv Detail & Related papers (2025-10-27T03:52:45Z)
ResAD: Normalized Residual Trajectory Modeling for End-to-End Autonomous Driving [64.42138266293202]
ResAD is a Normalized Residual Trajectory Modeling framework.<n>It reframes the learning task to predict the residual deviation from an inertial reference.<n>On the NAVSIM benchmark, ResAD achieves a state-of-the-art PDMS of 88.6 using a vanilla diffusion policy.
arXiv Detail & Related papers (2025-10-09T17:59:36Z)
Discrete Diffusion for Reflective Vision-Language-Action Models in Autonomous Driving [55.13109926181247]
We introduce ReflectDrive, a learning-based framework that integrates a reflection mechanism for safe trajectory generation via discrete diffusion.<n>Central to our approach is a safety-aware reflection mechanism that performs iterative self-correction without gradient.<n>Our method begins with goal-conditioned trajectory generation to model multi-modal driving behaviors.
arXiv Detail & Related papers (2025-09-24T13:35:15Z)
Align-Then-stEer: Adapting the Vision-Language Action Models through Unified Latent Guidance [63.33213516925946]
We introduce textbfAlign-Then-stEer (textttATE), a novel, data-efficient, and plug-and-play adaptation framework.<n>Our work presents a general and lightweight solution that greatly enhances the practicality of deploying VLA models to new robotic platforms and tasks.
arXiv Detail & Related papers (2025-09-02T07:51:59Z)
Foresight in Motion: Reinforcing Trajectory Prediction with Reward Heuristics [34.570579623171476]
"First Reasoning, Then Forecasting" is a strategy that explicitly incorporates behavior intentions as spatial guidance for trajectory prediction.<n>We introduce an interpretable, reward-driven intention reasoner grounded in a novel query-centric Inverse Reinforcement Learning scheme.<n>Our approach significantly enhances trajectory prediction confidence, achieving highly competitive performance relative to state-of-the-art methods.
arXiv Detail & Related papers (2025-07-16T09:46:17Z)
Aligning Latent Spaces with Flow Priors [72.24305287508474]
This paper presents a novel framework for aligning learnable latent spaces to arbitrary target distributions by leveraging flow-based generative models as priors.<n> Notably, the proposed method eliminates computationally expensive likelihood evaluations and avoids ODE solving during optimization.
arXiv Detail & Related papers (2025-06-05T16:59:53Z)
Deep Active Inference Agents for Delayed and Long-Horizon Environments [1.693200946453174]
AIF agents rely on accurate immediate predictions and exhaustive planning, a limitation that is exacerbated in delayed environments.<n>We propose a generative-policy architecture featuring a multi-step latent transition that lets the generative model predict an entire horizon in a single look-ahead.<n>We evaluate our agent in an environment that mimics a realistic industrial scenario with delayed and long-horizon settings.
arXiv Detail & Related papers (2025-05-26T11:50:22Z)
Safety-Critical Traffic Simulation with Guided Latent Diffusion Model [8.011306318131458]
Safety-critical traffic simulation plays a crucial role in evaluating autonomous driving systems.<n>We propose a guided latent diffusion model (LDM) capable of generating physically realistic and adversarial scenarios.<n>Our work provides an effective tool for realistic safety-critical scenario simulation, paving the way for more robust evaluation of autonomous driving systems.
arXiv Detail & Related papers (2025-05-01T13:33:34Z)
Predictive Planner for Autonomous Driving with Consistency Models [5.966385886363771]
Trajectory prediction and planning are essential for autonomous vehicles to navigate safely and efficiently in dynamic environments.<n>Recent diffusion-based generative models have shown promise in multi-agent trajectory generation, but their slow sampling is less suitable for high-frequency planning tasks.<n>We leverage the consistency model to build a predictive planner that samples from a joint distribution of ego and surrounding agents, conditioned on the ego vehicle's navigational goal.
arXiv Detail & Related papers (2025-02-12T00:26:01Z)
DiFSD: Ego-Centric Fully Sparse Paradigm with Uncertainty Denoising and Iterative Refinement for Efficient End-to-End Self-Driving [55.53171248839489]
We propose an ego-centric fully sparse paradigm, named DiFSD, for end-to-end self-driving.<n>Specifically, DiFSD mainly consists of sparse perception, hierarchical interaction and iterative motion planner.<n>Experiments conducted on nuScenes and Bench2Drive datasets demonstrate the superior planning performance and great efficiency of DiFSD.
arXiv Detail & Related papers (2024-09-15T15:55:24Z)
Model Checking for Closed-Loop Robot Reactive Planning [0.0]
We show how model checking can be used to create multistep plans for a differential drive wheeled robot so that it can avoid immediate danger. Using a small, purpose built model checking algorithm in situ we generate plans in real-time in a way that reflects the egocentric reactive response of simple biological agents.
arXiv Detail & Related papers (2023-11-16T11:02:29Z)
Generative Modeling with Phase Stochastic Bridges [49.4474628881673]
Diffusion models (DMs) represent state-of-the-art generative models for continuous inputs. We introduce a novel generative modeling framework grounded in textbfphase space dynamics Our framework demonstrates the capability to generate realistic data points at an early stage of dynamics propagation.
arXiv Detail & Related papers (2023-10-11T18:38:28Z)
The Importance of Prior Knowledge in Precise Multimodal Prediction [71.74884391209955]
Roads have well defined geometries, topologies, and traffic rules. In this paper we propose to incorporate structured priors as a loss function. We demonstrate the effectiveness of our approach on real-world self-driving datasets.
arXiv Detail & Related papers (2020-06-04T03:56:11Z)

This list is automatically generated from the titles and abstracts of the papers in this site.