DMD: A Large-Scale Multi-Modal Driver Monitoring Dataset for Attention
and Alertness Analysis
- URL: http://arxiv.org/abs/2008.12085v1
- Date: Thu, 27 Aug 2020 12:33:54 GMT
- Title: DMD: A Large-Scale Multi-Modal Driver Monitoring Dataset for Attention
and Alertness Analysis
- Authors: Juan Diego Ortega, Neslihan Kose, Paola Ca\~nas, Min-An Chao,
Alexander Unnervik, Marcos Nieto, Oihana Otaegui, Luis Salgado
- Abstract summary: Vision is the richest and most cost-effective technology for Driver Monitoring Systems (DMS)
The lack of sufficiently large and comprehensive datasets is currently a bottleneck for the progress of DMS development.
In this paper, we introduce the Driver Monitoring dataset (DMD), an extensive dataset which includes real and simulated driving scenarios.
- Score: 54.198237164152786
- License: http://creativecommons.org/licenses/by-nc-sa/4.0/
- Abstract: Vision is the richest and most cost-effective technology for Driver
Monitoring Systems (DMS), especially after the recent success of Deep Learning
(DL) methods. The lack of sufficiently large and comprehensive datasets is
currently a bottleneck for the progress of DMS development, crucial for the
transition of automated driving from SAE Level-2 to SAE Level-3. In this paper,
we introduce the Driver Monitoring Dataset (DMD), an extensive dataset which
includes real and simulated driving scenarios: distraction, gaze allocation,
drowsiness, hands-wheel interaction and context data, in 41 hours of RGB, depth
and IR videos from 3 cameras capturing face, body and hands of 37 drivers. A
comparison with existing similar datasets is included, which shows the DMD is
more extensive, diverse, and multi-purpose. The usage of the DMD is illustrated
by extracting a subset of it, the dBehaviourMD dataset, containing 13
distraction activities, prepared to be used in DL training processes.
Furthermore, we propose a robust and real-time driver behaviour recognition
system targeting a real-world application that can run on cost-efficient
CPU-only platforms, based on the dBehaviourMD. Its performance is evaluated
with different types of fusion strategies, which all reach enhanced accuracy
still providing real-time response.
Related papers
- Efficient Mixture-of-Expert for Video-based Driver State and Physiological Multi-task Estimation in Conditional Autonomous Driving [12.765198683804094]
Road safety remains a critical challenge worldwide, with approximately 1.35 million fatalities annually attributed to traffic accidents.
We propose a novel multi-task DMS, termed VDMoE, which leverages RGB video input to monitor driver states non-invasively.
arXiv Detail & Related papers (2024-10-28T14:49:18Z) - ODM3D: Alleviating Foreground Sparsity for Semi-Supervised Monocular 3D
Object Detection [15.204935788297226]
ODM3D framework entails cross-modal knowledge distillation at various levels to inject LiDAR-domain knowledge into a monocular detector during training.
By identifying foreground sparsity as the main culprit behind existing methods' suboptimal training, we exploit the precise localisation information embedded in LiDAR points.
Our method ranks 1st in both KITTI validation and test benchmarks, significantly surpassing all existing monocular methods, supervised or semi-supervised.
arXiv Detail & Related papers (2023-10-28T07:12:09Z) - LiDAR-BEVMTN: Real-Time LiDAR Bird's-Eye View Multi-Task Perception
Network for Autonomous Driving [7.137567622606353]
We present a real-time multi-task convolutional neural network for LiDAR-based object detection, semantics, and motion segmentation.
We propose a novel Semantic Weighting and Guidance (SWAG) module to transfer semantic features for improved object detection selectively.
We achieve state-of-the-art results for two tasks, semantic and motion segmentation, and close to state-of-the-art performance for 3D object detection.
arXiv Detail & Related papers (2023-07-17T21:22:17Z) - HUM3DIL: Semi-supervised Multi-modal 3D Human Pose Estimation for
Autonomous Driving [95.42203932627102]
3D human pose estimation is an emerging technology, which can enable the autonomous vehicle to perceive and understand the subtle and complex behaviors of pedestrians.
Our method efficiently makes use of these complementary signals, in a semi-supervised fashion and outperforms existing methods with a large margin.
Specifically, we embed LiDAR points into pixel-aligned multi-modal features, which we pass through a sequence of Transformer refinement stages.
arXiv Detail & Related papers (2022-12-15T11:15:14Z) - Towards Multimodal Multitask Scene Understanding Models for Indoor
Mobile Agents [49.904531485843464]
In this paper, we discuss the main challenge: insufficient, or even no, labeled data for real-world indoor environments.
We describe MMISM (Multi-modality input Multi-task output Indoor Scene understanding Model) to tackle the above challenges.
MMISM considers RGB images as well as sparse Lidar points as inputs and 3D object detection, depth completion, human pose estimation, and semantic segmentation as output tasks.
We show that MMISM performs on par or even better than single-task models.
arXiv Detail & Related papers (2022-09-27T04:49:19Z) - Benchmarking the Robustness of LiDAR-Camera Fusion for 3D Object
Detection [58.81316192862618]
Two critical sensors for 3D perception in autonomous driving are the camera and the LiDAR.
fusing these two modalities can significantly boost the performance of 3D perception models.
We benchmark the state-of-the-art fusion methods for the first time.
arXiv Detail & Related papers (2022-05-30T09:35:37Z) - One Million Scenes for Autonomous Driving: ONCE Dataset [91.94189514073354]
We introduce the ONCE dataset for 3D object detection in the autonomous driving scenario.
The data is selected from 144 driving hours, which is 20x longer than the largest 3D autonomous driving dataset available.
We reproduce and evaluate a variety of self-supervised and semi-supervised methods on the ONCE dataset.
arXiv Detail & Related papers (2021-06-21T12:28:08Z) - Fine-Grained Vehicle Perception via 3D Part-Guided Visual Data
Augmentation [77.60050239225086]
We propose an effective training data generation process by fitting a 3D car model with dynamic parts to vehicles in real images.
Our approach is fully automatic without any human interaction.
We present a multi-task network for VUS parsing and a multi-stream network for VHI parsing.
arXiv Detail & Related papers (2020-12-15T03:03:38Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.