Fugu-MT 論文翻訳(概要): DriveQA: Passing the Driving Knowledge Test

論文の概要: DriveQA: Passing the Driving Knowledge Test

arxiv url: http://arxiv.org/abs/2508.21824v1
Date: Fri, 29 Aug 2025 17:59:53 GMT
ステータス: 翻訳完了
システム内更新日: 2025-09-01 19:45:11.142663
Title: DriveQA: Passing the Driving Knowledge Test
Title（参考訳）: DriveQA: 運転知識テストに合格
Authors: Maolin Wei, Wanzhou Liu, Eshed Ohn-Bar,
Abstract要約: 交通規制やシナリオを網羅的にカバーする,広範なオープンソーステキストおよびビジョンベースのベンチマークであるDriveQAを紹介する。現状のLLMとMultimodal LLM(Multimodal LLMs)は,基本的トラフィックルールではよく機能するが,数値的推論や複雑な右側シナリオでは大きな弱点があることを示す。また、モデルがテキストおよび合成トラフィック知識を内部化し、下流QAタスクを効果的に一般化できることを実証する。
参考スコア（独自算出の注目度）: 13.569275971952154
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: If a Large Language Model (LLM) were to take a driving knowledge test today, would it pass? Beyond standard spatial and visual question-answering (QA) tasks on current autonomous driving benchmarks, driving knowledge tests require a complete understanding of all traffic rules, signage, and right-of-way principles. To pass this test, human drivers must discern various edge cases that rarely appear in real-world datasets. In this work, we present DriveQA, an extensive open-source text and vision-based benchmark that exhaustively covers traffic regulations and scenarios. Through our experiments using DriveQA, we show that (1) state-of-the-art LLMs and Multimodal LLMs (MLLMs) perform well on basic traffic rules but exhibit significant weaknesses in numerical reasoning and complex right-of-way scenarios, traffic sign variations, and spatial layouts, (2) fine-tuning on DriveQA improves accuracy across multiple categories, particularly in regulatory sign recognition and intersection decision-making, (3) controlled variations in DriveQA-V provide insights into model sensitivity to environmental factors such as lighting, perspective, distance, and weather conditions, and (4) pretraining on DriveQA enhances downstream driving task performance, leading to improved results on real-world datasets such as nuScenes and BDD, while also demonstrating that models can internalize text and synthetic traffic knowledge to generalize effectively across downstream QA tasks.
Abstract（参考訳）: もし、もしLarge Language Model(LLM)が今日、運転知識テストを受けるとしたら、それはパスするだろうか? 現在の自律運転ベンチマークにおける標準的な空間的および視覚的質問答え(QA)タスク以外にも、知識テストの駆動には、すべてのトラフィックルール、署名、正しい方向の原則を完全に理解する必要がある。このテストに合格するには、人間のドライバーは現実世界のデータセットにはほとんど現れない様々なエッジケースを識別する必要がある。本稿では,交通規制やシナリオを網羅的にカバーする,広範なオープンソーステキストおよびビジョンベースのベンチマークであるDriveQAを紹介する。 DriveQAを用いた実験により、(1)最先端のLCMとMLLM(Multimodal LLM)は、基本的な交通ルールでよく機能するが、数値的推論や複雑な右道シナリオ、交通標識のバリエーション、空間的レイアウトにおいて重大な弱点を示すこと、(2)DriveQAの微調整は、特に規制標識認識や交差点決定において、複数のカテゴリにわたる精度を改善すること、(3)DriveQA-Vの制御された変動は、照明、視点、距離、天候条件などの環境要因に対するモデル感度に対する洞察を提供すること、(4)DriveQAの事前訓練は、下流の運転タスク性能を高め、 nuScenes や BDDのような実世界のデータセットにおける結果を改善すること、さらに、内部のトラフィックをテキスト化して、より効率的に下流のタスクを下流のタスクに適応すること、などを示した。

論文の概要: DriveQA: Passing the Driving Knowledge Test

関連論文リスト