Fugu-MT 論文翻訳(概要): Zero-Shot Cross-City Generalization in End-to-End Autonomous Driving: Self-Supervised versus Supervised Representations

論文の概要: Zero-Shot Cross-City Generalization in End-to-End Autonomous Driving: Self-Supervised versus Supervised Representations

arxiv url: http://arxiv.org/abs/2603.11417v1
Date: Thu, 12 Mar 2026 01:19:32 GMT
ステータス: 翻訳完了
システム内更新日: 2026-03-13 14:46:25.802452
Title: Zero-Shot Cross-City Generalization in End-to-End Autonomous Driving: Self-Supervised versus Supervised Representations
Title（参考訳）: エンドツーエンド自動運転におけるゼロショット都市間一般化:自己監督と監督表現
Authors: Fatemeh Naeinian, Ali Hamza, Haoran Zhu, Anna Choromanska,
Abstract要約: エンド・ツー・エンド軌道計画におけるゼロショット・クロスシティの一般化について検討する。自己監督型視覚表現は都市間の移動を改善する。これらの結果は、エンド・ツー・エンドの自動運転システムを評価するために必要なテストとしてゼロショットの地理的移動を確立する。
参考スコア（独自算出の注目度）: 9.18632648031395
License: http://creativecommons.org/licenses/by-nc-nd/4.0/
Abstract: End-to-end autonomous driving models are typically trained on multi-city datasets using supervised ImageNet-pretrained backbones, yet their ability to generalize to unseen cities remains largely unexamined. When training and evaluation data are geographically mixed, models may implicitly rely on city-specific cues, masking failure modes that would occur under real domain shifts when generalizing to new locations. In this work we investigate zero-shot cross-city generalization in end-to-end trajectory planning and ask whether self-supervised visual representations improve transfer across cities. We conduct a comprehensive study by integrating self-supervised backbones (I-JEPA, DINOv2, and MAE) into planning frameworks. We evaluate performance under strict geographic splits on nuScenes in the open-loop setting and on NAVSIM in the closed-loop evaluation protocol. Our experiments reveal a substantial generalization gap when transferring models relying on traditional supervised backbones across cities with different road topologies and driving conventions, particularly when transferring from right-side to left-side driving environments. Self-supervised representation learning reduces this gap. In open-loop evaluation, a supervised backbone exhibits severe inflation when transferring from Boston to Singapore (L2 displacement ratio 9.77x, collision ratio 19.43x), whereas domain-specific self-supervised pretraining reduces this to 1.20x and 0.75x respectively. In closed-loop evaluation, self-supervised pretraining improves PDMS by up to 4 percent for all single-city training cities. These results show that representation learning strongly influences the robustness of cross-city planning and establish zero-shot geographic transfer as a necessary test for evaluating end-to-end autonomous driving systems.
Abstract（参考訳）: エンドツーエンドの自動運転モデルは一般的に、教師付きImageNetで事前訓練されたバックボーンを使用して、マルチシティデータセットでトレーニングされる。トレーニングと評価データが地理的に混合されている場合、モデルは暗黙的に都市固有の手がかりに依存し、新しい場所に一般化する際に実際のドメインシフトの下で発生する障害モードをマスキングする。本研究では、エンド・ツー・エンドの軌跡計画におけるゼロショット・クロスシティの一般化について検討し、自己監督型視覚表現が都市間の移動を改善するかどうかを問う。我々は,自己監督型バックボーン(I-JEPA,DINOv2,MAE)を計画枠組みに統合し,総合的な研究を行う。オープンループ設定における nuScenes とクローズドループ評価プロトコルにおける NAVSIM の厳密な地理的分割による性能評価を行った。実験の結果,道路トポロジや運転慣行の異なる都市間において,従来の監督されたバックボーンに依存したモデル転送を行う場合,特に右側から左側への走行環境において,大きな一般化ギャップが明らかとなった。自己指導型表現学習は、このギャップを減らします。オープンループ評価では、監督されたバックボーンはボストンからシンガポールへの移動時に激しいインフレを示す(L2変位比9.77x、衝突比19.43x)が、ドメイン固有の自己監督型事前訓練は、それぞれ1.20xと0.75xに減少する。クローズドループ評価では、自己監督型事前訓練は、全都市でPDMSを最大4%改善する。これらの結果は,表現学習が都市間計画の堅牢性に強く影響し,エンドツーエンドの自動運転システムを評価するために必要なテストとしてゼロショットの地理的移動を確立することを示唆している。

論文の概要: Zero-Shot Cross-City Generalization in End-to-End Autonomous Driving: Self-Supervised versus Supervised Representations

関連論文リスト