Related papers: Efficient Mixture-of-Expert for Video-based Driver State and Physiological Multi-task Estimation in Conditional Autonomous Driving

Efficient Mixture-of-Expert for Video-based Driver State and Physiological Multi-task Estimation in Conditional Autonomous Driving

URL: http://arxiv.org/abs/2410.21086v1
Date: Mon, 28 Oct 2024 14:49:18 GMT
Title: Efficient Mixture-of-Expert for Video-based Driver State and Physiological Multi-task Estimation in Conditional Autonomous Driving
Authors: Jiyao Wang, Xiao Yang, Zhenyu Wang, Ximeng Wei, Ange Wang, Dengbo He, Kaishun Wu,
Abstract summary: Road safety remains a critical challenge worldwide, with approximately 1.35 million fatalities annually attributed to traffic accidents. We propose a novel multi-task DMS, termed VDMoE, which leverages RGB video input to monitor driver states non-invasively.
Score: 12.765198683804094
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Road safety remains a critical challenge worldwide, with approximately 1.35 million fatalities annually attributed to traffic accidents, often due to human errors. As we advance towards higher levels of vehicle automation, challenges still exist, as driving with automation can cognitively over-demand drivers if they engage in non-driving-related tasks (NDRTs), or lead to drowsiness if driving was the sole task. This calls for the urgent need for an effective Driver Monitoring System (DMS) that can evaluate cognitive load and drowsiness in SAE Level-2/3 autonomous driving contexts. In this study, we propose a novel multi-task DMS, termed VDMoE, which leverages RGB video input to monitor driver states non-invasively. By utilizing key facial features to minimize computational load and integrating remote Photoplethysmography (rPPG) for physiological insights, our approach enhances detection accuracy while maintaining efficiency. Additionally, we optimize the Mixture-of-Experts (MoE) framework to accommodate multi-modal inputs and improve performance across different tasks. A novel prior-inclusive regularization method is introduced to align model outputs with statistical priors, thus accelerating convergence and mitigating overfitting risks. We validate our method with the creation of a new dataset (MCDD), which comprises RGB video and physiological indicators from 42 participants, and two public datasets. Our findings demonstrate the effectiveness of VDMoE in monitoring driver states, contributing to safer autonomous driving systems. The code and data will be released.

Related papers

Driver-Net: Multi-Camera Fusion for Assessing Driver Take-Over Readiness in Automated Vehicles [3.637162892228131]
Driver-Net is a novel deep learning framework that fuses multi-camera inputs to estimate driver take-over readiness.<n>It captures synchronised visual cues from the driver's head, hands, and body posture through a triple-camera setup.<n>The proposed method achieves an accuracy of up to 95.8% in driver readiness classification.
arXiv Detail & Related papers (2025-07-05T19:27:03Z)
Predicting Multitasking in Manual and Automated Driving with Optimal Supervisory Control [2.0794380287086214]
This paper presents a computational cognitive model that simulates human multitasking while driving. Based on optimal supervisory control theory, the model predicts how multitasking adapts to variations in driving demands, interactive tasks, and automation levels.
arXiv Detail & Related papers (2025-03-23T08:56:53Z)
SafeAuto: Knowledge-Enhanced Safe Autonomous Driving with Multimodal Foundation Models [63.71984266104757]
Multimodal Large Language Models (MLLMs) can process both visual and textual data. We propose SafeAuto, a novel framework that enhances MLLM-based autonomous driving systems by incorporating both unstructured and structured knowledge.
arXiv Detail & Related papers (2025-02-28T21:53:47Z)
Minds on the Move: Decoding Trajectory Prediction in Autonomous Driving with Cognitive Insights [18.92479778025183]
In driving scenarios, a vehicle's trajectory is determined by the decision-making process of human drivers. Previous models fail to capture the true intentions of human drivers, leading to suboptimal performance in long-term trajectory prediction. We introduce a Cognitive-Informed Transformer (CITF) that incorporates a cognitive concept, Perceived Safety, to interpret drivers' decision-making mechanisms.
arXiv Detail & Related papers (2025-02-27T13:43:17Z)
Driver Assistance System Based on Multimodal Data Hazard Detection [0.0]
This paper proposes a multimodal driver assistance detection system. It integrates road condition video, driver facial video, and audio data to enhance incident recognition accuracy.
arXiv Detail & Related papers (2025-02-05T09:02:39Z)
G-MEMP: Gaze-Enhanced Multimodal Ego-Motion Prediction in Driving [71.9040410238973]
We focus on inferring the ego trajectory of a driver's vehicle using their gaze data. Next, we develop G-MEMP, a novel multimodal ego-trajectory prediction network that combines GPS and video input with gaze data. The results show that G-MEMP significantly outperforms state-of-the-art methods in both benchmarks.
arXiv Detail & Related papers (2023-12-13T23:06:30Z)
Unsupervised Domain Adaptation for Self-Driving from Past Traversal Features [69.47588461101925]
We propose a method to adapt 3D object detectors to new driving environments. Our approach enhances LiDAR-based detection models using spatial quantized historical features. Experiments on real-world datasets demonstrate significant improvements.
arXiv Detail & Related papers (2023-09-21T15:00:31Z)
Towards Safe Autonomy in Hybrid Traffic: Detecting Unpredictable Abnormal Behaviors of Human Drivers via Information Sharing [21.979007506007733]
We show that our proposed algorithm has great detection performance in both highway and urban traffic. The best performance achieves detection rate of 97.3%, average detection delay of 1.2s, and 0 false alarm.
arXiv Detail & Related papers (2023-08-23T18:24:28Z)
A Novel Driver Distraction Behavior Detection Method Based on Self-supervised Learning with Masked Image Modeling [5.1680226874942985]
Driver distraction causes a significant number of traffic accidents every year, resulting in economic losses and casualties. Driver distraction detection primarily relies on traditional convolutional neural networks (CNN) and supervised learning methods. This paper proposes a new self-supervised learning method based on masked image modeling for driver distraction behavior detection.
arXiv Detail & Related papers (2023-06-01T10:53:32Z)
Generative AI-empowered Simulation for Autonomous Driving in Vehicular Mixed Reality Metaverses [130.15554653948897]
In vehicular mixed reality (MR) Metaverse, distance between physical and virtual entities can be overcome. Large-scale traffic and driving simulation via realistic data collection and fusion from the physical world is difficult and costly. We propose an autonomous driving architecture, where generative AI is leveraged to synthesize unlimited conditioned traffic and driving data in simulations.
arXiv Detail & Related papers (2023-02-16T16:54:10Z)
FBLNet: FeedBack Loop Network for Driver Attention Prediction [75.83518507463226]
Nonobjective driving experience is difficult to model. In this paper, we propose a FeedBack Loop Network (FBLNet) which attempts to model the driving experience accumulation procedure. Under the guidance of the incremental knowledge, our model fuses the CNN feature and Transformer feature that are extracted from the input image to predict driver attention.
arXiv Detail & Related papers (2022-12-05T08:25:09Z)
Augmented Driver Behavior Models for High-Fidelity Simulation Study of Crash Detection Algorithms [2.064612766965483]
We present a simulation platform for a hybrid transportation system that includes both human-driven and automated vehicles. We decompose the human driving task and offer a modular approach to simulating a large-scale traffic scenario. We analyze a large driving dataset to extract expressive parameters that would best describe different driving characteristics.
arXiv Detail & Related papers (2022-08-10T19:59:16Z)
Multi-Modal Fusion Transformer for End-to-End Autonomous Driving [59.60483620730437]
We propose TransFuser, a novel Multi-Modal Fusion Transformer, to integrate image and LiDAR representations using attention. Our approach achieves state-of-the-art driving performance while reducing collisions by 76% compared to geometry-based fusion.
arXiv Detail & Related papers (2021-04-19T11:48:13Z)
DMD: A Large-Scale Multi-Modal Driver Monitoring Dataset for Attention and Alertness Analysis [54.198237164152786]
Vision is the richest and most cost-effective technology for Driver Monitoring Systems (DMS) The lack of sufficiently large and comprehensive datasets is currently a bottleneck for the progress of DMS development. In this paper, we introduce the Driver Monitoring dataset (DMD), an extensive dataset which includes real and simulated driving scenarios.
arXiv Detail & Related papers (2020-08-27T12:33:54Z)
Deep Reinforcement Learning for Human-Like Driving Policies in Collision Avoidance Tasks of Self-Driving Cars [1.160208922584163]
We introduce a model-free, deep reinforcement learning approach to generate automated human-like driving policies. We study a static obstacle avoidance task on a two-lane highway road in simulation. We demonstrate that our approach leads to human-like driving policies.
arXiv Detail & Related papers (2020-06-07T18:20:33Z)

This list is automatically generated from the titles and abstracts of the papers in this site.