Fugu-MT 論文翻訳(概要): BATON: A Multimodal Benchmark for Bidirectional Automation Transition Observation in Naturalistic Driving

論文の概要: BATON: A Multimodal Benchmark for Bidirectional Automation Transition Observation in Naturalistic Driving

arxiv url: http://arxiv.org/abs/2604.07263v1
Date: Wed, 08 Apr 2026 16:29:24 GMT
ステータス: 翻訳完了
システム内更新日: 2026-04-09 17:30:51.632925
Title: BATON: A Multimodal Benchmark for Bidirectional Automation Transition Observation in Naturalistic Driving
Title（参考訳）: BATON:自然運転における双方向自動遷移観測のためのマルチモーダルベンチマーク
Authors: Yuhang Wang, Yiyao Xu, Chaoyun Yang, Lingyao Li, Jingran Sun, Hao Zhou,
Abstract要約: 既存の運転自動化(DA)システムは、DAにいつ参加するかを決めるために人間のドライバーに依存している。 BATONは127人のドライバーと136.6時間の運転における現実のDA使用量を自然言語で分析するデータセットである。データセットは、フロントビュービデオ、キャビン内ビデオ、デコードされたCANバス信号、レーダーベースのリード車間相互作用、GPSからのルートコンテキストを同期する。その結果、フロントビュービデオは運転状態ではなく道路状況を捉え、インキャビンビデオは運転準備を反映するが、外部シーンは反映しないことがわかった。
参考スコア（独自算出の注目度）: 18.15118596168445
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Existing driving automation (DA) systems on production vehicles rely on human drivers to decide when to engage DA while requiring them to remain continuously attentive and ready to intervene. This design demands substantial situational judgment and imposes significant cognitive load, leading to steep learning curves, suboptimal user experience, and safety risks from both over-reliance and delayed takeover. Predicting when drivers hand over control to DA and when they take it back is therefore critical for designing proactive, context-aware HMI, yet existing datasets rarely capture the multimodal context, including road scene, driver state, vehicle dynamics, and route environment. To fill this gap, we introduce BATON, a large-scale naturalistic dataset capturing real-world DA usage across 127 drivers, and 136.6 hours of driving. The dataset synchronizes front-view video, in-cabin video, decoded CAN bus signals, radar-based lead-vehicle interaction, and GPS-derived route context, forming a closed-loop multimodal record around each control transition. We define three benchmark tasks: driving action understanding, handover prediction, and takeover prediction, and evaluate baselines spanning sequence models, classical classifiers, and zero-shot VLMs. Results show that visual input alone is insufficient for reliable transition prediction: front-view video captures road context but not driver state, while in-cabin video reflects driver readiness but not the external scene. Incorporating CAN and route-context signals substantially improves performance over video-only settings, indicating strong complementarity across modalities. We further find takeover events develop more gradually and benefit from longer prediction horizons, whereas handover events depend more on immediate contextual cues, revealing an asymmetry with direct implications for HMI design in assisted driving systems.
Abstract（参考訳）: 既存の運転自動化(DA)システムは、人間ドライバーがDAをいつ関与するかを判断し、継続的に注意し、介入する準備ができている。この設計は、重大な状況判断を必要とし、大きな認知的負荷を課し、急激な学習曲線、最適ユーザエクスペリエンス、および過度な信頼と遅延したテイクオーバーによる安全性リスクをもたらす。したがって、ドライバーがDAに制御を委譲し、それを返却する際の予測は、アクティブでコンテキスト対応のHMIを設計する上で重要であるが、既存のデータセットは、道路シーン、ドライバー状態、車両のダイナミクス、ルート環境など、マルチモーダルなコンテキストをキャプチャすることは滅多にない。このギャップを埋めるために、BATONは127人のドライバーと136.6時間の運転で現実世界のDA使用量をキャプチャする大規模な自然言語データセットである。データセットは、フロントビュービデオ、キャビン内ビデオ、デコードされたCANバス信号、レーダーベースのリード車間相互作用、GPS由来のルートコンテキストを同期し、各コントロールトランジションの周りにクローズドループマルチモーダルレコードを形成する。動作理解、ハンドオーバ予測、テイクオーバ予測の3つのベンチマークタスクを定義し、シーケンスモデルにまたがるベースライン、古典的分類器、ゼロショットVLMを評価する。フロントビュービデオは道路状況をキャプチャするが、運転状態は捉えないが、インキャビンビデオは運転準備を反映するが、外部シーンは反映しない。 CANとルートコンテキストの信号を組み込むことで、ビデオのみの設定よりも性能が大幅に向上し、モダリティ間の強い相補性を示す。さらに、テイクオーバイベントはより徐々に発展し、より長い予測地平線から恩恵を受けるのに対し、ハンドオーバイベントは文脈的手がかりに依存し、補助駆動システムにおいてHMI設計に直接的な意味を持つ非対称性を明らかにする。

論文の概要: BATON: A Multimodal Benchmark for Bidirectional Automation Transition Observation in Naturalistic Driving

関連論文リスト