Related papers: Generalized Trajectory Scoring for End-to-end Multimodal Planning

Generalized Trajectory Scoring for End-to-end Multimodal Planning

URL: http://arxiv.org/abs/2506.06664v1
Date: Sat, 07 Jun 2025 05:06:05 GMT
Title: Generalized Trajectory Scoring for End-to-end Multimodal Planning
Authors: Zhenxin Li, Wenhao Yao, Zi Wang, Xinglong Sun, Joshua Chen, Nadine Chang, Maying Shen, Zuxuan Wu, Shiyi Lan, Jose M. Alvarez,
Abstract summary: Generalized Trajectory Scoring (GTRS) is a unified framework for end-to-end multi-modal planning.<n>GTRS consists of three complementary innovations: (1) a diffusion-based trajectory generator that produces diverse fine-grained proposals; (2) a vocabulary generalization technique that trains a scorer on super-dense trajectory sets with dropout regularization; and (3) a sensor augmentation strategy that enhances out-of-domain generalization.<n>As the winning solution of the Navsim v2 Challenge, GTRS demonstrates superior performance even with sub-optimal sensor inputs.
Score: 42.38746285135693
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: End-to-end multi-modal planning is a promising paradigm in autonomous driving, enabling decision-making with diverse trajectory candidates. A key component is a robust trajectory scorer capable of selecting the optimal trajectory from these candidates. While recent trajectory scorers focus on scoring either large sets of static trajectories or small sets of dynamically generated ones, both approaches face significant limitations in generalization. Static vocabularies provide effective coarse discretization but struggle to make fine-grained adaptation, while dynamic proposals offer detailed precision but fail to capture broader trajectory distributions. To overcome these challenges, we propose GTRS (Generalized Trajectory Scoring), a unified framework for end-to-end multi-modal planning that combines coarse and fine-grained trajectory evaluation. GTRS consists of three complementary innovations: (1) a diffusion-based trajectory generator that produces diverse fine-grained proposals; (2) a vocabulary generalization technique that trains a scorer on super-dense trajectory sets with dropout regularization, enabling its robust inference on smaller subsets; and (3) a sensor augmentation strategy that enhances out-of-domain generalization while incorporating refinement training for critical trajectory discrimination. As the winning solution of the Navsim v2 Challenge, GTRS demonstrates superior performance even with sub-optimal sensor inputs, approaching privileged methods that rely on ground-truth perception. Code will be available at https://github.com/NVlabs/GTRS.

Related papers

OmniVL-Guard: Towards Unified Vision-Language Forgery Detection and Grounding via Balanced RL [63.388513841293616]
Existing forgery detection methods fail to handle the interleaved text, images, and videos prevalent in real-world misinformation.<n>To bridge this gap, this paper targets to develop a unified framework for omnibus vision-language forgery detection and grounding.<n>We propose textbf OmniVL-Guard, a balanced reinforcement learning framework for omnibus vision-language forgery detection and grounding.
arXiv Detail & Related papers (2026-02-11T09:41:36Z)
AnchDrive: Bootstrapping Diffusion Policies with Hybrid Trajectory Anchors for End-to-End Driving [19.724857120152944]
AnchDrive is a framework for end-to-end driving.<n>It bootstraps a diffusion policy to mitigate the high computational cost of traditional generative models.<n>Experiments on the NAVSIM benchmark confirm that AnchDrive sets a new state-of-the-art.
arXiv Detail & Related papers (2025-09-24T15:38:41Z)
Discrete-Guided Diffusion for Scalable and Safe Multi-Robot Motion Planning [56.240199425429445]
Multi-Robot Motion Planning (MPMP) involves generating trajectories for multiple robots operating in a shared continuous workspace.<n>While discrete multi-agent finding (MAPF) methods are broadly adopted due to their scalability, their coarse discretization trajectory quality.<n>This paper tackles limitations of two approaches by introducing discrete MAPF solvers with constrained generative diffusion models.
arXiv Detail & Related papers (2025-08-27T17:59:36Z)
Optimizing Multi-Modal Trackers via Sensitivity-aware Regularized Tuning [112.12667472919723]
This paper tackles the challenge of optimizing multi-modal trackers by effectively adapting the pre-trained models for RGB data.<n>Existing fine-tuning paradigms oscillate between excessive freedom and over-restriction, leading to a suboptimal plasticity-stability trade-off.<n>We propose a novel sensitivity-aware regularized tuning framework, which delicately refines the learning process by incorporating intrinsic parameter sensitivities.
arXiv Detail & Related papers (2025-08-24T18:42:47Z)
EvaDrive: Evolutionary Adversarial Policy Optimization for End-to-End Autonomous Driving [17.57364638932072]
EvaDrive is a novel reinforcement learning framework for autonomous driving.<n>It provides a closed-loop adversarial framework for human-like iterative decision-making.<n>Extensive experiments on NAVSIM and Bench2Drive benchmarks demonstrate SOTA performance.
arXiv Detail & Related papers (2025-08-05T11:26:28Z)
DELTAv2: Accelerating Dense 3D Tracking [79.63990337419514]
We propose a novel algorithm for accelerating dense long-term 3D point tracking in videos.<n>We introduce a coarse-to-fine strategy that begins tracking with a small subset of points and progressively expands the set of tracked trajectories.<n>The newly added trajectories are using a learnable module, which is trained end-to-end alongside the tracking network.
arXiv Detail & Related papers (2025-08-02T03:15:47Z)
DriveSuprim: Towards Precise Trajectory Selection for End-to-End Planning [43.284391163049236]
DriveSuprim is a selection-based paradigm for trajectory selection in autonomous vehicles.<n>It achieves state-of-the-art performance, including collision avoidance and compliance with rules.<n>It maintains high trajectory quality in various driving scenarios.
arXiv Detail & Related papers (2025-06-07T04:39:06Z)
GUIDE-CoT: Goal-driven and User-Informed Dynamic Estimation for Pedestrian Trajectory using Chain-of-Thought [9.572859785331307]
We propose Goal-driven and User-Informed Dynamic Estimation for pedestrian trajectory using Chain-of-Thought (GUIDE-CoT)<n>Our approach integrates two innovative modules: (1) a goal-oriented visual prompt, which enhances goal prediction accuracy combining visual prompts with a pretrained visual encoder, and (2) a chain-of-thought (CoT) LLM for trajectory generation, which generates realistic trajectories toward the predicted goal.<n>Our method achieves state-of-the-art performance, delivering both high accuracy and greater adaptability in pedestrian trajectory prediction.
arXiv Detail & Related papers (2025-03-10T01:39:24Z)
Exploring Dynamic Transformer for Efficient Object Tracking [58.120191254379854]
We propose DyTrack, a dynamic transformer framework for efficient tracking.<n>DyTrack automatically learns to configure proper reasoning routes for various inputs, gaining better utilization of the available computational budget.<n>Experiments on multiple benchmarks demonstrate that DyTrack achieves promising speed-precision trade-offs with only a single model.
arXiv Detail & Related papers (2024-03-26T12:31:58Z)
An Effective Motion-Centric Paradigm for 3D Single Object Tracking in Point Clouds [50.19288542498838]
3D single object tracking in LiDAR point clouds (LiDAR SOT) plays a crucial role in autonomous driving. Current approaches all follow the Siamese paradigm based on appearance matching. We introduce a motion-centric paradigm to handle LiDAR SOT from a new perspective.
arXiv Detail & Related papers (2023-03-21T17:28:44Z)
Motion Transformer with Global Intention Localization and Local Movement Refinement [103.75625476231401]
Motion TRansformer (MTR) models motion prediction as the joint optimization of global intention localization and local movement refinement. MTR achieves state-of-the-art performance on both the marginal and joint motion prediction challenges.
arXiv Detail & Related papers (2022-09-27T16:23:14Z)
Trajectory Prediction with Graph-based Dual-scale Context Fusion [43.51107329748957]
We present a graph-based trajectory prediction network named the Dual Scale Predictor. It encodes both the static and dynamical driving context in a hierarchical manner. Thanks to the proposed dual-scale context fusion network, our DSP is able to generate accurate and human-like multi-modal trajectories.
arXiv Detail & Related papers (2021-11-02T13:42:16Z)
Divide-and-Conquer for Lane-Aware Diverse Trajectory Prediction [71.97877759413272]
Trajectory prediction is a safety-critical tool for autonomous vehicles to plan and execute actions. Recent methods have achieved strong performances using Multi-Choice Learning objectives like winner-takes-all (WTA) or best-of-many. Our work addresses two key challenges in trajectory prediction, learning outputs, and better predictions by imposing constraints using driving knowledge.
arXiv Detail & Related papers (2021-04-16T17:58:56Z)
Autonomous Drone Racing with Deep Reinforcement Learning [39.757652701917166]
In many robotic tasks, such as drone racing, the goal is to travel through a set of waypoints as fast as possible. A key challenge is planning the minimum-time trajectory, which is typically solved by assuming perfect knowledge of the waypoints to pass in advance. In this work, a new approach to minimum-time trajectory generation for quadrotors is presented.
arXiv Detail & Related papers (2021-03-15T18:05:49Z)
Improving Movement Predictions of Traffic Actors in Bird's-Eye View Models using GANs and Differentiable Trajectory Rasterization [12.652210024012374]
One of the most critical pieces of the self-driving puzzle is the task of predicting future movement of surrounding traffic actors. Methods based on top-down sceneization on one side and Generative Adrial Networks (GANs) on the other have shown to be particularly successful. In this paper we build upon these two directions and propose aversa-based conditional GAN architecture. We evaluate the proposed method on a large-scale, real-world data set, showing that it outperforms state-of-the-art GAN-based baselines.
arXiv Detail & Related papers (2020-04-14T00:41:17Z)

This list is automatically generated from the titles and abstracts of the papers in this site.