Related papers: PlanT 2.0: Exposing Biases and Structural Flaws in Closed-Loop Driving

PlanT 2.0: Exposing Biases and Structural Flaws in Closed-Loop Driving

URL: http://arxiv.org/abs/2511.07292v1
Date: Mon, 10 Nov 2025 16:41:47 GMT
Title: PlanT 2.0: Exposing Biases and Structural Flaws in Closed-Loop Driving
Authors: Simon Gerstenecker, Andreas Geiger, Katrin Renz,
Abstract summary: We introduce PlanT, a lightweight, object-centric planning transformer designed for autonomous driving research in CARLA.<n>To tackle the scenarios newly introduced by the challenging CARLA Leaderboard 2.0, we introduce multiple upgrades to PlanT.<n>We argue for a shift toward data-centric development, with a focus on richer, more robust, and less biased datasets.
Score: 24.431701691830046
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Most recent work in autonomous driving has prioritized benchmark performance and methodological innovation over in-depth analysis of model failures, biases, and shortcut learning. This has led to incremental improvements without a deep understanding of the current failures. While it is straightforward to look at situations where the model fails, it is hard to understand the underlying reason. This motivates us to conduct a systematic study, where inputs to the model are perturbed and the predictions observed. We introduce PlanT 2.0, a lightweight, object-centric planning transformer designed for autonomous driving research in CARLA. The object-level representation enables controlled analysis, as the input can be easily perturbed (e.g., by changing the location or adding or removing certain objects), in contrast to sensor-based models. To tackle the scenarios newly introduced by the challenging CARLA Leaderboard 2.0, we introduce multiple upgrades to PlanT, achieving state-of-the-art performance on Longest6 v2, Bench2Drive, and the CARLA validation routes. Our analysis exposes insightful failures, such as a lack of scene understanding caused by low obstacle diversity, rigid expert behaviors leading to exploitable shortcuts, and overfitting to a fixed set of expert trajectories. Based on these findings, we argue for a shift toward data-centric development, with a focus on richer, more robust, and less biased datasets. We open-source our code and model at https://github.com/autonomousvision/plant2.

Related papers

PILOT: Planning via Internalized Latent Optimization Trajectories for Large Language Models [51.43746425777865]
Large Language Models (LLMs) often lack the capacity to formulate global strategies, leading to error propagation in long-horizon tasks.<n>We propose PILOT, a framework designed to internalize the strategic oversight of large models into intrinsic Latent Guidance.
arXiv Detail & Related papers (2026-01-07T12:38:56Z)
Guardian: Detecting Robotic Planning and Execution Errors with Vision-Language Models [53.20969621498248]
We propose an automatic robot failure synthesis approach that procedurally perturbs successful trajectories to generate diverse planning and execution failures.<n>We construct three new failure detection benchmarks: RLBench-Fail, BridgeDataV2-Fail, and UR5-Fail.<n>We then train Guardian, a VLM with multi-view images for detailed failure reasoning and detection.
arXiv Detail & Related papers (2025-12-01T17:57:27Z)
ResAD: Normalized Residual Trajectory Modeling for End-to-End Autonomous Driving [64.42138266293202]
ResAD is a Normalized Residual Trajectory Modeling framework.<n>It reframes the learning task to predict the residual deviation from an inertial reference.<n>On the NAVSIM benchmark, ResAD achieves a state-of-the-art PDMS of 88.6 using a vanilla diffusion policy.
arXiv Detail & Related papers (2025-10-09T17:59:36Z)
Real-Time Model Checking for Closed-Loop Robot Reactive Planning [0.0]
We present a new application of model checking which achieves real-time multi-step planning and obstacle avoidance on a real autonomous robot.<n>We have developed a small, purpose-built model checking algorithm which generates plans in situ based on "core" knowledge and attention as found in biological agents.<n>Our approach is based on chaining temporary control systems which are spawned to counteract disturbances in the local environment.
arXiv Detail & Related papers (2025-08-26T16:49:30Z)
Hidden Biases of End-to-End Driving Datasets [25.931831743383782]
We make a first attempt at end-to-end driving for CARLA Leaderboard 2.0.<n>We systematically analyze the training dataset, leading to new insights.<n>Our model ranks first and second respectively on the map and sensors tracks of the 2024 CARLA Challenge.
arXiv Detail & Related papers (2024-12-12T18:59:13Z)
Real-Time Anomaly Detection and Reactive Planning with Large Language Models [18.57162998677491]
Foundation models, e.g., large language models (LLMs), trained on internet-scale data possess zero-shot capabilities. We present a two-stage reasoning framework that incorporates the judgement regarding potential anomalies into a safe control framework. This enables our monitor to improve the trustworthiness of dynamic robotic systems, such as quadrotors or autonomous vehicles.
arXiv Detail & Related papers (2024-07-11T17:59:22Z)
DeTra: A Unified Model for Object Detection and Trajectory Forecasting [68.85128937305697]
Our approach formulates the union of the two tasks as a trajectory refinement problem. To tackle this unified task, we design a refinement transformer that infers the presence, pose, and multi-modal future behaviors of objects. In our experiments, we observe that ourmodel outperforms the state-of-the-art on Argoverse 2 Sensor and Open dataset.
arXiv Detail & Related papers (2024-06-06T18:12:04Z)
A Reliable Framework for Human-in-the-Loop Anomaly Detection in Time Series [17.08674819906415]
We introduce HILAD, a novel framework designed to foster a dynamic and bidirectional collaboration between humans and AI.<n>Through our visual interface, HILAD empowers domain experts to detect, interpret, and correct unexpected model behaviors at scale.
arXiv Detail & Related papers (2024-05-06T07:44:07Z)
Is Ego Status All You Need for Open-Loop End-to-End Autonomous Driving? [84.17711168595311]
End-to-end autonomous driving has emerged as a promising research direction to target autonomy from a full-stack perspective. nuScenes dataset, characterized by relatively simple driving scenarios, leads to an under-utilization of perception information in end-to-end models. We introduce a new metric to evaluate whether the predicted trajectories adhere to the road.
arXiv Detail & Related papers (2023-12-05T11:32:31Z)
Rethinking the Open-Loop Evaluation of End-to-End Autonomous Driving in nuScenes [38.43491956142818]
Planning task involves predicting the trajectory of the ego vehicle based on inputs from both internal intention and the external environment. Most existing works evaluate their performance on the nuScenes dataset using the L2 error and collision rate between the predicted trajectories and the ground truth. In this paper, we reevaluate these existing evaluation metrics and explore whether they accurately measure the superiority of different methods. Our simple method achieves similar end-to-end planning performance on the nuScenes dataset with other perception-based methods, reducing the average L2 error by about 20%.
arXiv Detail & Related papers (2023-05-17T17:59:11Z)
DiffStack: A Differentiable and Modular Control Stack for Autonomous Vehicles [75.43355868143209]
We present DiffStack, a differentiable and modular stack for prediction, planning, and control. Our results on the nuScenes dataset indicate that end-to-end training with DiffStack yields substantial improvements in open-loop and closed-loop planning metrics.
arXiv Detail & Related papers (2022-12-13T09:05:21Z)

This list is automatically generated from the titles and abstracts of the papers in this site.