Fugu-MT 論文翻訳(概要): GeoDrive-Bench: Benchmarking Region-Specific Multimodal Reasoning in Autonomous Driving

論文の概要: GeoDrive-Bench: Benchmarking Region-Specific Multimodal Reasoning in Autonomous Driving

arxiv url: http://arxiv.org/abs/2606.02774v1
Date: Mon, 01 Jun 2026 18:36:46 GMT
ステータス: 翻訳完了
システム内更新日: 2026-06-04 10:57:21.701368
Title: GeoDrive-Bench: Benchmarking Region-Specific Multimodal Reasoning in Autonomous Driving
Title（参考訳）: GeoDrive-Bench: 自律運転における領域特異的マルチモーダル推論のベンチマーク
Authors: Yingzi Ma, Chaowei Xiao, Ming Jiang,
Abstract要約: 自動運転のための視覚言語モデル(VLM)は有望な性能を示しているが、地域固有の交通ルールを扱う能力はいまだ探索されていない。本稿では,VLMの地理文化的根拠に基づく運転推論の体系的調査を可能にする新しいベンチマークであるGeoDrive-Benchを紹介する。
参考スコア（独自算出の注目度）: 43.04860654830679
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Vision-language models (VLMs) for autonomous driving have shown promising performance, but their ability to handle region-specific traffic rules remains underexplored, raising uncertainties about their deployment across diverse global settings. We therefore introduce GeoDrive-Bench, a novel benchmark that enables the systematic investigation of VLMs' geo-culturally grounded driving reasoning. We curated 5,053 human-validated multiple-choice QA pairs across six countries covering diverse driving cultures. Specifically, we emphasize four driving tasks: perception, prediction, planning, and region reasoning. Each question requires models to infer the correct driving behavior from visual evidence and local traffic conventions without explicit country labels. Beyond evaluation, we further design a distillation algorithm that injects region-specific traffic-rule knowledge into the internal representations of VLMs, enabling models to better align visual scene understanding with local driving policies. Experiments on nine state-of-the-art VLMs show substantial performance variations across geo-driving cultures for each task, while our proposed baseline models exhibit improved geo-cultural reasoning across regions. These results suggest that current VLMs still lack robust region-aware driving intelligence and highlight GeoDrive-Bench as a diagnostic and training-oriented testbed for deployable autonomous driving foundation models.
Abstract（参考訳）: 自動運転のための視覚言語モデル(VLM)は、有望なパフォーマンスを示しているが、リージョン固有のトラフィックルールを扱う能力はまだ探索されていない。そこで本稿では,VLMの地理的背景に基づく運転推論の体系的な研究を可能にする新しいベンチマークであるGeoDrive-Benchを紹介する。多様な運転文化をカバーする6つの国で5,053人の人型多目的QAペアを採取した。具体的には、認識、予測、計画、地域推論という4つの駆動タスクを強調します。それぞれの質問は、明確なカントリーラベルなしで視覚的証拠や地元の交通慣行から正しい運転行動を推測するモデルを必要とする。評価以外にも,VLMの内部表現に領域固有の交通ルール知識を注入する蒸留アルゴリズムを設計し,視覚的シーン理解とローカルな運転ポリシーとの整合性を高める。現状の9つのVLM実験では,各タスクのジオドライブ文化にかなりの性能変化がみられ,提案するベースラインモデルでは地域ごとのジオカルチャー推論の改善が見られた。これらの結果から,現在のVLMには領域認識型運転インテリジェンスがないことが示唆され,GeoDrive-Benchは自律運転基盤モデルをデプロイするための診断およびトレーニング指向テストベッドとして注目されている。

論文の概要: GeoDrive-Bench: Benchmarking Region-Specific Multimodal Reasoning in Autonomous Driving

関連論文リスト