Related papers: EvaDrive: Evolutionary Adversarial Policy Optimization for End-to-End Autonomous Driving

EvaDrive: Evolutionary Adversarial Policy Optimization for End-to-End Autonomous Driving

URL: http://arxiv.org/abs/2508.09158v2
Date: Thu, 14 Aug 2025 07:22:36 GMT
Title: EvaDrive: Evolutionary Adversarial Policy Optimization for End-to-End Autonomous Driving
Authors: Siwen Jiao, Kangan Qian, Hao Ye, Yang Zhong, Ziang Luo, Sicong Jiang, Zilin Huang, Yangyi Fang, Jinyu Miao, Zheng Fu, Yunlong Wang, Kun Jiang, Diange Yang, Rui Fan, Baoyun Peng,
Abstract summary: EvaDrive is a novel reinforcement learning framework for autonomous driving.<n>It provides a closed-loop adversarial framework for human-like iterative decision-making.<n>Extensive experiments on NAVSIM and Bench2Drive benchmarks demonstrate SOTA performance.
Score: 17.57364638932072
License: http://creativecommons.org/licenses/by/4.0/
Abstract: Autonomous driving faces significant challenges in achieving human-like iterative decision-making, which continuously generates, evaluates, and refines trajectory proposals. Current generation-evaluation frameworks isolate trajectory generation from quality assessment, preventing iterative refinement essential for planning, while reinforcement learning methods collapse multi-dimensional preferences into scalar rewards, obscuring critical trade-offs and yielding scalarization bias.To overcome these issues, we present EvaDrive, a novel multi-objective reinforcement learning framework that establishes genuine closed-loop co-evolution between trajectory generation and evaluation via adversarial optimization. EvaDrive frames trajectory planning as a multi-round adversarial game. In this game, a hierarchical generator continuously proposes candidate paths by combining autoregressive intent modeling for temporal causality with diffusion-based refinement for spatial flexibility. These proposals are then rigorously assessed by a trainable multi-objective critic that explicitly preserves diverse preference structures without collapsing them into a single scalarization bias.This adversarial interplay, guided by a Pareto frontier selection mechanism, enables iterative multi-round refinement, effectively escaping local optima while preserving trajectory diversity.Extensive experiments on NAVSIM and Bench2Drive benchmarks demonstrate SOTA performance, achieving 94.9 PDMS on NAVSIM v1 (surpassing DiffusionDrive by 6.8, DriveSuprim by 5.0, and TrajHF by 0.9) and 64.96 Driving Score on Bench2Drive. EvaDrive generates diverse driving styles via dynamic weighting without external preference data, introducing a closed-loop adversarial framework for human-like iterative decision-making, offering a novel scalarization-free trajectory optimization approach.

Related papers

DiffusionDriveV2: Reinforcement Learning-Constrained Truncated Diffusion Modeling in End-to-End Autonomous Driving [65.7087560656003]
Generative diffusion models for end-to-end autonomous driving often suffer from mode collapse.<n>We propose DiffusionDriveV2, which leverages reinforcement learning to constrain low-quality modes and explore for superior trajectories.<n>This significantly enhances the overall output quality while preserving the inherent multimodality of its core Gaussian Mixture Model.
arXiv Detail & Related papers (2025-12-08T17:29:52Z)
Optimization-Guided Diffusion for Interactive Scene Generation [52.23368750264419]
We present OMEGA, an optimization-guided, training-free framework that enforces structural consistency and interaction awareness during diffusion-based sampling.<n>We show that OMEGA improves generation realism, consistency, and controllability, increasing the ratio of physically and behaviorally valid scenes.<n>Our approach can also generate $5times$ more near-collision frames with a time-to-collision under three seconds.
arXiv Detail & Related papers (2025-12-08T15:56:18Z)
ResAD: Normalized Residual Trajectory Modeling for End-to-End Autonomous Driving [64.42138266293202]
ResAD is a Normalized Residual Trajectory Modeling framework.<n>It reframes the learning task to predict the residual deviation from an inertial reference.<n>On the NAVSIM benchmark, ResAD achieves a state-of-the-art PDMS of 88.6 using a vanilla diffusion policy.
arXiv Detail & Related papers (2025-10-09T17:59:36Z)
Discrete Diffusion for Reflective Vision-Language-Action Models in Autonomous Driving [55.13109926181247]
We introduce ReflectDrive, a learning-based framework that integrates a reflection mechanism for safe trajectory generation via discrete diffusion.<n>Central to our approach is a safety-aware reflection mechanism that performs iterative self-correction without gradient.<n>Our method begins with goal-conditioned trajectory generation to model multi-modal driving behaviors.
arXiv Detail & Related papers (2025-09-24T13:35:15Z)
Steerable Adversarial Scenario Generation through Test-Time Preference Alignment [58.37104890690234]
Adversarial scenario generation is a cost-effective approach for safety assessment of autonomous driving systems.<n>We introduce a new framework named textbfSteerable textbfAdversarial scenario textbfGEnerator (SAGE)<n>SAGE enables fine-grained test-time control over the trade-off between adversariality and realism without any retraining.
arXiv Detail & Related papers (2025-09-24T13:27:35Z)
DriveSuprim: Towards Precise Trajectory Selection for End-to-End Planning [43.284391163049236]
DriveSuprim is a selection-based paradigm for trajectory selection in autonomous vehicles.<n>It achieves state-of-the-art performance, including collision avoidance and compliance with rules.<n>It maintains high trajectory quality in various driving scenarios.
arXiv Detail & Related papers (2025-06-07T04:39:06Z)
HMAD: Advancing E2E Driving with Anchored Offset Proposals and Simulation-Supervised Multi-target Scoring [7.564094719956086]
We introduce HMAD, a framework integrating a distinctive Bird's-Eye-View (BEV) based trajectory proposal mechanism with learned multi-criteria scoring.<n>A key innovation, our simulation-supervised scorer module, then evaluates these proposals against critical metrics including no at-fault collisions, drivable area compliance, comfortableness, and overall driving quality.<n>Demonstrating its efficacy, HMAD achieves a 44.5% driving score on the CVPR 2025 private test set.
arXiv Detail & Related papers (2025-05-29T05:59:24Z)
Preference-Guided Diffusion for Multi-Objective Offline Optimization [64.08326521234228]
We propose a preference-guided diffusion model for offline multi-objective optimization.<n>Our guidance is a preference model trained to predict the probability that one design dominates another.<n>Our results highlight the effectiveness of classifier-guided diffusion models in generating diverse and high-quality solutions.
arXiv Detail & Related papers (2025-03-21T16:49:38Z)
Finetuning Generative Trajectory Model with Reinforcement Learning from Human Feedback [33.09982089166203]
We introduce TrajHF, a human feedback-driven finetuning framework for generative trajectory models.<n>TrajHF refines multi-modal trajectory generation beyond conventional imitation learning.<n>It achieves PDMS of 93.95 on NavSim benchmark, significantly exceeding other methods.
arXiv Detail & Related papers (2025-03-13T14:56:17Z)
Predictive Planner for Autonomous Driving with Consistency Models [5.966385886363771]
Trajectory prediction and planning are essential for autonomous vehicles to navigate safely and efficiently in dynamic environments.<n>Recent diffusion-based generative models have shown promise in multi-agent trajectory generation, but their slow sampling is less suitable for high-frequency planning tasks.<n>We leverage the consistency model to build a predictive planner that samples from a joint distribution of ego and surrounding agents, conditioned on the ego vehicle's navigational goal.
arXiv Detail & Related papers (2025-02-12T00:26:01Z)
DiFSD: Ego-Centric Fully Sparse Paradigm with Uncertainty Denoising and Iterative Refinement for Efficient End-to-End Self-Driving [55.53171248839489]
We propose an ego-centric fully sparse paradigm, named DiFSD, for end-to-end self-driving.<n>Specifically, DiFSD mainly consists of sparse perception, hierarchical interaction and iterative motion planner.<n>Experiments conducted on nuScenes and Bench2Drive datasets demonstrate the superior planning performance and great efficiency of DiFSD.
arXiv Detail & Related papers (2024-09-15T15:55:24Z)
Integrating Higher-Order Dynamics and Roadway-Compliance into Constrained ILQR-based Trajectory Planning for Autonomous Vehicles [3.200238632208686]
Trajectory planning aims to produce a globally optimal route for Autonomous Passenger Vehicles. Existing implementations utilizing the vehicle bicycle kinematic model may not guarantee controllable trajectories. We augment this model by higher-order terms, including the first and second-order derivatives of curvature and longitudinal jerk.
arXiv Detail & Related papers (2023-09-25T22:30:18Z)
Divide-and-Conquer for Lane-Aware Diverse Trajectory Prediction [71.97877759413272]
Trajectory prediction is a safety-critical tool for autonomous vehicles to plan and execute actions. Recent methods have achieved strong performances using Multi-Choice Learning objectives like winner-takes-all (WTA) or best-of-many. Our work addresses two key challenges in trajectory prediction, learning outputs, and better predictions by imposing constraints using driving knowledge.
arXiv Detail & Related papers (2021-04-16T17:58:56Z)

This list is automatically generated from the titles and abstracts of the papers in this site.