Related papers: Cross-Camera Distracted Driver Classification through Feature Disentanglement and Contrastive Learning

Cross-Camera Distracted Driver Classification through Feature Disentanglement and Contrastive Learning

URL: http://arxiv.org/abs/2411.13181v2
Date: Sat, 21 Jun 2025 16:43:18 GMT
Title: Cross-Camera Distracted Driver Classification through Feature Disentanglement and Contrastive Learning
Authors: Simone Bianco, Luigi Celona, Paolo Napoletano,
Abstract summary: Driver Behavior Monitoring Network (DBMNet) relies on a lightweight backbone and integrates a disentanglement module to discard camera view information.<n>DBMNet achieves an improvement of 7% in Top-1 accuracy compared to existing approaches.
Score: 13.613407983544427
License: http://creativecommons.org/licenses/by-nc-nd/4.0/
Abstract: The classification of distracted drivers is pivotal for ensuring safe driving. Previous studies demonstrated the effectiveness of neural networks in automatically predicting driver distraction, fatigue, and potential hazards. However, recent research has uncovered a significant loss of accuracy in these models when applied to samples acquired under conditions that differ from the training data. In this paper, we introduce a robust model designed to withstand changes in camera position within the vehicle. Our Driver Behavior Monitoring Network (DBMNet) relies on a lightweight backbone and integrates a disentanglement module to discard camera view information from features, coupled with contrastive learning to enhance the encoding of various driver actions. Experiments conducted using a leave-one-camera-out protocol on the daytime and nighttime subsets of the 100-Driver dataset validate the effectiveness of our approach. Cross-dataset and cross-camera experiments conducted on three benchmark datasets, namely AUCDD-V1, EZZ2021 and SFD, demonstrate the superior generalization capabilities of the proposed method. Overall DBMNet achieves an improvement of 7% in Top-1 accuracy compared to existing approaches. Moreover, a quantized version of the DBMNet and all considered methods has been deployed on a Coral Dev Board board. In this deployment scenario, DBMNet outperforms alternatives, achieving the lowest average error while maintaining a compact model size, low memory footprint, fast inference time, and minimal power consumption.

Related papers

Driver-Net: Multi-Camera Fusion for Assessing Driver Take-Over Readiness in Automated Vehicles [3.637162892228131]
Driver-Net is a novel deep learning framework that fuses multi-camera inputs to estimate driver take-over readiness.<n>It captures synchronised visual cues from the driver's head, hands, and body posture through a triple-camera setup.<n>The proposed method achieves an accuracy of up to 95.8% in driver readiness classification.
arXiv Detail & Related papers (2025-07-05T19:27:03Z)
CodeMerge: Codebook-Guided Model Merging for Robust Test-Time Adaptation in Autonomous Driving [28.022501313260648]
Existing test-time adaptation methods often fail in high-variance tasks like 3D object detection due to unstable optimization and sharp minima.<n>We introduce CodeMerge, a scalable model merging framework that bypasses these limitations by operating in a compact latent space.<n>Our method achieves strong performance across challenging benchmarks, improving end-to-end 3D detection 14.9% NDS on nuScenes-C and LiDAR-based detection by over 7.6% mAP.
arXiv Detail & Related papers (2025-05-22T11:09:15Z)
TrajSSL: Trajectory-Enhanced Semi-Supervised 3D Object Detection [59.498894868956306]
Pseudo-labeling approaches to semi-supervised learning adopt a teacher-student framework. We leverage pre-trained motion-forecasting models to generate object trajectories on pseudo-labeled data. Our approach improves pseudo-label quality in two distinct manners.
arXiv Detail & Related papers (2024-09-17T05:35:00Z)
Real-time Accident Anticipation for Autonomous Driving Through Monocular Depth-Enhanced 3D Modeling [18.071748815365005]
We introduce an innovative framework, AccNet, which significantly advances the prediction capabilities beyond the current state-of-the-art (SOTA) 2D-based methods. We propose the Binary Adaptive Loss for Early Anticipation (BA-LEA) to address the prevalent challenge of skewed data distribution in traffic accident datasets.
arXiv Detail & Related papers (2024-09-02T13:46:25Z)
Federated Learning for Drowsiness Detection in Connected Vehicles [0.19116784879310028]
Driver monitoring systems can assist in determining the driver's state. Driver drowsiness detection presents a potential solution. transmitting the data to a central machine for model training is impractical due to the large data size and privacy concerns. We propose a federated learning framework for drowsiness detection within a vehicular network, leveraging the YawDD dataset.
arXiv Detail & Related papers (2024-05-06T09:39:13Z)
PoseViNet: Distracted Driver Action Recognition Framework Using Multi-View Pose Estimation and Vision Transformer [1.319058156672392]
This paper introduces a novel method for detection of driver distraction using multi-view driver action images. The proposed method is a vision transformer-based framework with pose estimation and action inference, namely PoseViNet. The PoseViNet achieves 97.55% validation accuracy and 90.92% testing accuracy with the challenging dataset.
arXiv Detail & Related papers (2023-12-22T10:13:10Z)
Leveraging Driver Field-of-View for Multimodal Ego-Trajectory Prediction [69.29802752614677]
RouteFormer is a novel ego-trajectory prediction network combining GPS data, environmental context, and the driver's field-of-view. To tackle data scarcity and enhance diversity, we introduce GEM, a dataset of urban driving scenarios enriched with synchronized driver field-of-view and gaze data.
arXiv Detail & Related papers (2023-12-13T23:06:30Z)
Unsupervised Domain Adaptation for Self-Driving from Past Traversal Features [69.47588461101925]
We propose a method to adapt 3D object detectors to new driving environments. Our approach enhances LiDAR-based detection models using spatial quantized historical features. Experiments on real-world datasets demonstrate significant improvements.
arXiv Detail & Related papers (2023-09-21T15:00:31Z)
CCSPNet-Joint: Efficient Joint Training Method for Traffic Sign Detection Under Extreme Conditions [3.6190463374643795]
CCSPNet is an efficient feature extraction module based on Contextual Transformer and CNN. We propose a joint training model, CCSPNet-Joint, to improve data efficiency and generalization. Experiments have shown that CCSPNet achieves state-of-the-art performance in traffic sign detection under extreme conditions.
arXiv Detail & Related papers (2023-09-13T12:00:33Z)
A Novel Driver Distraction Behavior Detection Method Based on Self-supervised Learning with Masked Image Modeling [5.1680226874942985]
Driver distraction causes a significant number of traffic accidents every year, resulting in economic losses and casualties. Driver distraction detection primarily relies on traditional convolutional neural networks (CNN) and supervised learning methods. This paper proposes a new self-supervised learning method based on masked image modeling for driver distraction behavior detection.
arXiv Detail & Related papers (2023-06-01T10:53:32Z)
FBLNet: FeedBack Loop Network for Driver Attention Prediction [75.83518507463226]
Nonobjective driving experience is difficult to model. In this paper, we propose a FeedBack Loop Network (FBLNet) which attempts to model the driving experience accumulation procedure. Under the guidance of the incremental knowledge, our model fuses the CNN feature and Transformer feature that are extracted from the input image to predict driver attention.
arXiv Detail & Related papers (2022-12-05T08:25:09Z)
Benchmarking the Robustness of LiDAR-Camera Fusion for 3D Object Detection [58.81316192862618]
Two critical sensors for 3D perception in autonomous driving are the camera and the LiDAR. fusing these two modalities can significantly boost the performance of 3D perception models. We benchmark the state-of-the-art fusion methods for the first time.
arXiv Detail & Related papers (2022-05-30T09:35:37Z)
A High-Accuracy Unsupervised Person Re-identification Method Using Auxiliary Information Mined from Datasets [53.047542904329866]
We make use of auxiliary information mined from datasets for multi-modal feature learning. This paper proposes three effective training tricks, including Restricted Label Smoothing Cross Entropy Loss (RLSCE), Weight Adaptive Triplet Loss (WATL) and Dynamic Training Iterations (DTI)
arXiv Detail & Related papers (2022-05-06T10:16:18Z)
Modified Supervised Contrastive Learning for Detecting Anomalous Driving Behaviours [1.4544109317472054]
We formulate this problem as a supervised contrastive learning approach to learn a visual representation to detect normal, and seen and unseen anomalous driving behaviours. We show our results on a Driver Anomaly Detection dataset that contains 783 minutes of video recordings of normal and anomalous driving behaviours of 31 drivers.
arXiv Detail & Related papers (2021-09-09T03:50:19Z)
One Million Scenes for Autonomous Driving: ONCE Dataset [91.94189514073354]
We introduce the ONCE dataset for 3D object detection in the autonomous driving scenario. The data is selected from 144 driving hours, which is 20x longer than the largest 3D autonomous driving dataset available. We reproduce and evaluate a variety of self-supervised and semi-supervised methods on the ONCE dataset.
arXiv Detail & Related papers (2021-06-21T12:28:08Z)
A Deep Value-network Based Approach for Multi-Driver Order Dispatching [55.36656442934531]
We propose a deep reinforcement learning based solution for order dispatching. We conduct large scale online A/B tests on DiDi's ride-dispatching platform. Results show that CVNet consistently outperforms other recently proposed dispatching methods.
arXiv Detail & Related papers (2021-06-08T16:27:04Z)
Efficient and Robust LiDAR-Based End-to-End Navigation [132.52661670308606]
We present an efficient and robust LiDAR-based end-to-end navigation framework. We propose Fast-LiDARNet that is based on sparse convolution kernel optimization and hardware-aware model design. We then propose Hybrid Evidential Fusion that directly estimates the uncertainty of the prediction from only a single forward pass.
arXiv Detail & Related papers (2021-05-20T17:52:37Z)
Joint Noise-Tolerant Learning and Meta Camera Shift Adaptation for Unsupervised Person Re-Identification [60.36551512902312]
unsupervised person re-identification (re-ID) aims to learn discriminative models with unlabeled data. One popular method is to obtain pseudo-label by clustering and use them to optimize the model. In this paper, we propose a unified framework to solve both problems.
arXiv Detail & Related papers (2021-03-08T09:13:06Z)
Driver2vec: Driver Identification from Automotive Data [44.84876493736275]
Driver2vec is able to accurately identify the driver from a short 10-second interval of sensor data. Driver2vec is trained on a dataset of 51 drivers provided by Nervtech.
arXiv Detail & Related papers (2021-02-10T03:09:13Z)
Auto-Rectify Network for Unsupervised Indoor Depth Estimation [119.82412041164372]
We establish that the complex ego-motions exhibited in handheld settings are a critical obstacle for learning depth. We propose a data pre-processing method that rectifies training images by removing their relative rotations for effective learning. Our results outperform the previous unsupervised SOTA method by a large margin on the challenging NYUv2 dataset.
arXiv Detail & Related papers (2020-06-04T08:59:17Z)

This list is automatically generated from the titles and abstracts of the papers in this site.