Fugu-MT 論文翻訳(概要): Do Open-Loop Metrics Predict Closed-Loop Driving? A Cross-Benchmark Correlation Study of NAVSIM and Bench2Drive

論文の概要: Do Open-Loop Metrics Predict Closed-Loop Driving? A Cross-Benchmark Correlation Study of NAVSIM and Bench2Drive

arxiv url: http://arxiv.org/abs/2605.00066v1
Date: Thu, 30 Apr 2026 09:27:57 GMT
ステータス: 翻訳完了
システム内更新日: 2026-05-04 17:43:28.667782
Title: Do Open-Loop Metrics Predict Closed-Loop Driving? A Cross-Benchmark Correlation Study of NAVSIM and Bench2Drive
Title（参考訳）: Open-Loop Metricsは閉ループ運転を予測するか? NAVSIMとBench2Driveの相互比較研究
Authors: Yiru Wang, Anqing Jiang, Shuo Wang, Yuwen Heng, Hai Yang, Yang Chen, Hao Sun,
Abstract要約: オープンループ評価は、自動運転プランナーの高速かつ再現可能な評価を提供する。従来のオープンループメトリクスはクローズドループドライビングスコアと信頼性のない相関を示す。
参考スコア（独自算出の注目度）: 19.12252168142987
License: http://creativecommons.org/licenses/by/4.0/
Abstract: Open-loop evaluation offers fast, reproducible assessment of autonomous driving planners, but its ability to predict real closed-loop driving performance remains questionable. Prior work has shown that traditional open-loop metrics such as Average Displacement Error (ADE) and Final Displacement Error (FDE) exhibit no reliable correlation with closed-loop Driving Score. In this paper, we ask whether the more recent, safety-aware open-loop metrics introduced by NAVSIM~v2 can bridge this gap. By systematically cross-referencing published results from 15 state-of-the-art methods across NAVSIM (open-loop) and Bench2Drive (closed-loop), we compile a paired dataset of open-loop sub-metrics and closed-loop performance, yielding 8 methods with complete paired data. Our analysis reveals three key findings: (1) the aggregate NAVSIM PDM Score shows a strong positive but non-monotonic correlation with Bench2Drive Driving Score, with clear ranking inversions; (2) among individual NAVSIM sub-metrics, Ego Progress (EP) is the strongest single predictor of closed-loop success, substantially exceeding the safety-critical collision metric NC; (3) the safety-progress trade-off manifests differently in open-loop and closed-loop: methods that maximize safety at the expense of progress rank highly in NAVSIM but underperform in closed-loop due to timeout and slow-driving penalties. We further demonstrate that a much simpler 3-metric formula matches the predictive power of the full 5-metric PDMS at the same Spearman $ρ{=}0.90$ on our paired sample of $n{=}8$ methods, suggesting that within current state-of-the-art methods -- where TTC and Comfort approach saturation -- these two sub-metrics add little marginal information for closed-loop ranking. Additionally, we identify the snowball effect -- where small open-loop deviations compound into closed-loop failures -- as a candidate mechanism for the residual gap.
Abstract（参考訳）: オープンループ評価は、自律走行プランナの高速かつ再現可能な評価を提供するが、実際のクローズドループ走行性能を予測する能力は疑問視されている。 Average Displacement Error (ADE) や Final Displacement Error (FDE) といった従来のオープンループメトリクスは、クローズドループのドライビングスコアと信頼性がないことを示している。本稿では,NAVSIM~v2が導入した,より最近の安全を意識したオープンループメトリクスが,このギャップを埋めるかどうかを問う。 NAVSIM(open-loop)とBench2Drive(closed-loop)にまたがる15の最先端手法の公開結果を体系的に相互参照することにより、オープンループのサブメトリックとクローズドループのパフォーマンスのペアデータセットをコンパイルし、完全なペアデータで8つのメソッドを生成する。 1) NAVSIM PDMスコアはベンチ2Drive ドライビングスコアと強い正の相関を示すが, 明確なランクインバージョンを持つ。(2) NAVSIMサブメトリックでは, エゴプログレス(EP)はクローズループ成功の最も強い予測因子であり, NCをはるかに上回っている。(3) オープンループとクローズループのトレードオフは, オープンループとクローズループでは異なる。さらに、より単純な3次元式は、同じSpearman $ρ{=}0.90$で5次元PDMSの予測力と一致することを証明し、現在の最先端メソッド -- TTC と Comfort のアプローチ飽和 -- において、これらの2つのサブメトリックは閉ループのランク付けにわずかな限界情報を加えることを示唆している。さらに、雪だるま効果 - 小さな開ループ偏差が閉ループ障害に合併する - を残留ギャップの候補メカニズムとして特定する。

論文の概要: Do Open-Loop Metrics Predict Closed-Loop Driving? A Cross-Benchmark Correlation Study of NAVSIM and Bench2Drive

関連論文リスト