Related papers: Robust Multiview Multimodal Driver Monitoring System Using Masked Multi-Head Self-Attention

Robust Multiview Multimodal Driver Monitoring System Using Masked Multi-Head Self-Attention

URL: http://arxiv.org/abs/2304.06370v1
Date: Thu, 13 Apr 2023 09:50:32 GMT
Title: Robust Multiview Multimodal Driver Monitoring System Using Masked Multi-Head Self-Attention
Authors: Yiming Ma, Victor Sanchez, Soodeh Nikan, Devesh Upadhyay, Bhushan Atote, Tanaya Guha
Abstract summary: We propose a novel multiview multimodal driver monitoring system based on feature-level fusion through multi-head self-attention (MHSA) We demonstrate its effectiveness by comparing it against four alternative fusion strategies (Sum, Convarity, SE, and AFF) Experiments on this enhanced database demonstrate that 1) the proposed MHSA-based fusion method (AUC-ROC: 97.0%) outperforms all baselines and previous approaches, and 2) training MHSA with patch masking can improve its robustness against modality/view collapses.
Score: 28.18784311981388
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Driver Monitoring Systems (DMSs) are crucial for safe hand-over actions in Level-2+ self-driving vehicles. State-of-the-art DMSs leverage multiple sensors mounted at different locations to monitor the driver and the vehicle's interior scene and employ decision-level fusion to integrate these heterogenous data. However, this fusion method may not fully utilize the complementarity of different data sources and may overlook their relative importance. To address these limitations, we propose a novel multiview multimodal driver monitoring system based on feature-level fusion through multi-head self-attention (MHSA). We demonstrate its effectiveness by comparing it against four alternative fusion strategies (Sum, Conv, SE, and AFF). We also present a novel GPU-friendly supervised contrastive learning framework SuMoCo to learn better representations. Furthermore, We fine-grained the test split of the DAD dataset to enable the multi-class recognition of drivers' activities. Experiments on this enhanced database demonstrate that 1) the proposed MHSA-based fusion method (AUC-ROC: 97.0\%) outperforms all baselines and previous approaches, and 2) training MHSA with patch masking can improve its robustness against modality/view collapses. The code and annotations are publicly available.

Related papers

SafeAuto: Knowledge-Enhanced Safe Autonomous Driving with Multimodal Foundation Models [63.71984266104757]
Multimodal Large Language Models (MLLMs) can process both visual and textual data. We propose SafeAuto, a novel framework that enhances MLLM-based autonomous driving systems by incorporating both unstructured and structured knowledge.
arXiv Detail & Related papers (2025-02-28T21:53:47Z)
Graph-Based Multi-Modal Sensor Fusion for Autonomous Driving [3.770103075126785]
We introduce a novel approach to multi-modal sensor fusion, focusing on developing a graph-based state representation. We present a Sensor-Agnostic Graph-Aware Kalman Filter, the first online state estimation technique designed to fuse multi-modal graphs. We validate the effectiveness of our proposed framework through extensive experiments conducted on both synthetic and real-world driving datasets.
arXiv Detail & Related papers (2024-11-06T06:58:17Z)
Multimodality Helps Few-Shot 3D Point Cloud Semantic Segmentation [61.91492500828508]
Few-shot 3D point cloud segmentation (FS-PCS) aims at generalizing models to segment novel categories with minimal support samples. We introduce a cost-free multimodal FS-PCS setup, utilizing textual labels and the potentially available 2D image modality. We propose a simple yet effective Test-time Adaptive Cross-modal Seg (TACC) technique to mitigate training bias.
arXiv Detail & Related papers (2024-10-29T19:28:41Z)
MultiFuser: Multimodal Fusion Transformer for Enhanced Driver Action Recognition [10.060717595852271]
We propose a novel multimodal fusion transformer, named MultiFuser. It identifies cross-modal interrelations and interactions among multimodal car cabin videos. Extensive experiments are conducted on Drive&Act dataset.
arXiv Detail & Related papers (2024-08-03T12:33:21Z)
M2DA: Multi-Modal Fusion Transformer Incorporating Driver Attention for Autonomous Driving [11.36165122994834]
We propose a Multi-Modal fusion transformer incorporating Driver Attention (M2DA) for autonomous driving. By incorporating driver attention, we empower the human-like scene understanding ability to autonomous vehicles to identify crucial areas precisely and ensure safety.
arXiv Detail & Related papers (2024-03-19T08:54:52Z)
Joint Multimodal Transformer for Emotion Recognition in the Wild [49.735299182004404]
Multimodal emotion recognition (MMER) systems typically outperform unimodal systems. This paper proposes an MMER method that relies on a joint multimodal transformer (JMT) for fusion with key-based cross-attention.
arXiv Detail & Related papers (2024-03-15T17:23:38Z)
G-MEMP: Gaze-Enhanced Multimodal Ego-Motion Prediction in Driving [71.9040410238973]
We focus on inferring the ego trajectory of a driver's vehicle using their gaze data. Next, we develop G-MEMP, a novel multimodal ego-trajectory prediction network that combines GPS and video input with gaze data. The results show that G-MEMP significantly outperforms state-of-the-art methods in both benchmarks.
arXiv Detail & Related papers (2023-12-13T23:06:30Z)
Drive Anywhere: Generalizable End-to-end Autonomous Driving with Multi-modal Foundation Models [114.69732301904419]
We present an approach to apply end-to-end open-set (any environment/scene) autonomous driving that is capable of providing driving decisions from representations queryable by image and text. Our approach demonstrates unparalleled results in diverse tests while achieving significantly greater robustness in out-of-distribution situations.
arXiv Detail & Related papers (2023-10-26T17:56:35Z)
AIDE: A Vision-Driven Multi-View, Multi-Modal, Multi-Tasking Dataset for Assistive Driving Perception [26.84439405241999]
We present an AssIstive Driving pErception dataset (AIDE) that considers context information both inside and outside the vehicle. AIDE facilitates holistic driver monitoring through three distinctive characteristics. Two fusion strategies are introduced to give new insights into learning effective multi-stream/modal representations.
arXiv Detail & Related papers (2023-07-26T03:12:05Z)
Towards Robust On-Ramp Merging via Augmented Multimodal Reinforcement Learning [9.48157144651867]
We present a novel approach for Robust on-ramp merge of CAVs via Augmented and Multi-modal Reinforcement Learning. Specifically, we formulate the on-ramp merging problem as a Markov decision process (MDP) by taking driving safety, comfort driving behavior, and traffic efficiency into account. To provide reliable merging maneuvers, we simultaneously leverage BSM and surveillance images for multi-modal observation.
arXiv Detail & Related papers (2022-07-21T16:34:57Z)
Multimodal Object Detection via Bayesian Fusion [59.31437166291557]
We study multimodal object detection with RGB and thermal cameras, since the latter can provide much stronger object signatures under poor illumination. Our key contribution is a non-learned late-fusion method that fuses together bounding box detections from different modalities. We apply our approach to benchmarks containing both aligned (KAIST) and unaligned (FLIR) multimodal sensor data.
arXiv Detail & Related papers (2021-04-07T04:03:20Z)
DMD: A Large-Scale Multi-Modal Driver Monitoring Dataset for Attention and Alertness Analysis [54.198237164152786]
Vision is the richest and most cost-effective technology for Driver Monitoring Systems (DMS) The lack of sufficiently large and comprehensive datasets is currently a bottleneck for the progress of DMS development. In this paper, we introduce the Driver Monitoring dataset (DMD), an extensive dataset which includes real and simulated driving scenarios.
arXiv Detail & Related papers (2020-08-27T12:33:54Z)

This list is automatically generated from the titles and abstracts of the papers in this site.