Related papers: Applying Spatiotemporal Attention to Identify Distracted and Drowsy Driving with Vision Transformers

Applying Spatiotemporal Attention to Identify Distracted and Drowsy Driving with Vision Transformers

URL: http://arxiv.org/abs/2207.12148v1
Date: Fri, 22 Jul 2022 16:36:48 GMT
Title: Applying Spatiotemporal Attention to Identify Distracted and Drowsy Driving with Vision Transformers
Authors: Samay Lakhani
Abstract summary: A 20% rise in car crashes in 2021 compared to 2020 has been observed as a result of increased distraction and drowsiness. Drowsy and distracted driving are the cause of 45% of all car crashes. This work investigated the use of the vision transformer to outperform state-of-the-art accuracy from 3D-CNNs.
Score: 0.0
License: http://creativecommons.org/licenses/by/4.0/
Abstract: A 20% rise in car crashes in 2021 compared to 2020 has been observed as a result of increased distraction and drowsiness. Drowsy and distracted driving are the cause of 45% of all car crashes. As a means to decrease drowsy and distracted driving, detection methods using computer vision can be designed to be low-cost, accurate, and minimally invasive. This work investigated the use of the vision transformer to outperform state-of-the-art accuracy from 3D-CNNs. Two separate transformers were trained for drowsiness and distractedness. The drowsy video transformer model was trained on the National Tsing-Hua University Drowsy Driving Dataset (NTHU-DDD) with a Video Swin Transformer model for 10 epochs on two classes -- drowsy and non-drowsy simulated over 10.5 hours. The distracted video transformer was trained on the Driver Monitoring Dataset (DMD) with Video Swin Transformer for 50 epochs over 9 distraction-related classes. The accuracy of the drowsiness model reached 44% and a high loss value on the test set, indicating overfitting and poor model performance. Overfitting indicates limited training data and applied model architecture lacked quantifiable parameters to learn. The distracted model outperformed state-of-the-art models on DMD reaching 97.5%, indicating that with sufficient data and a strong architecture, transformers are suitable for unfit driving detection. Future research should use newer and stronger models such as TokenLearner to achieve higher accuracy and efficiency, merge existing datasets to expand to detecting drunk driving and road rage to create a comprehensive solution to prevent traffic crashes, and deploying a functioning prototype to revolutionize the automotive safety industry.

Related papers

Cross-Camera Distracted Driver Classification through Feature Disentanglement and Contrastive Learning [13.613407983544427]
We introduce a robust model designed to withstand changes in camera position within the vehicle. Our Driver Behavior Monitoring Network (DBMNet) relies on a lightweight backbone and integrates a disentanglement module. Experiments conducted on the daytime and nighttime subsets of the 100-Driver dataset validate the effectiveness of our approach.
arXiv Detail & Related papers (2024-11-20T10:27:12Z)
Knowledge Distillation Neural Network for Predicting Car-following Behaviour of Human-driven and Autonomous Vehicles [2.099922236065961]
This study investigates the car-following behaviours of three vehicle pairs: HDV-AV, AV-HDV and HDV-HDV in mixed traffic. We introduce a data-driven Knowledge Distillation Neural Network (KDNN) model for predicting car-following behaviour in terms of speed.
arXiv Detail & Related papers (2024-11-08T14:57:59Z)
Mitigating Covariate Shift in Imitation Learning for Autonomous Vehicles Using Latent Space Generative World Models [60.87795376541144]
A world model is a neural network capable of predicting an agent's next state given past states and actions. During end-to-end training, our policy learns how to recover from errors by aligning with states observed in human demonstrations. We present qualitative and quantitative results, demonstrating significant improvements upon prior state of the art in closed-loop testing.
arXiv Detail & Related papers (2024-09-25T06:48:25Z)
DRUformer: Enhancing the driving scene Important object detection with driving relationship self-understanding [50.81809690183755]
Traffic accidents frequently lead to fatal injuries, contributing to over 50 million deaths until 2023. Previous research primarily assessed the importance of individual participants, treating them as independent entities. We introduce Driving scene Relationship self-Understanding transformer (DRUformer) to enhance the important object detection task.
arXiv Detail & Related papers (2023-11-11T07:26:47Z)
Learning Self-Regularized Adversarial Views for Self-Supervised Vision Transformers [105.89564687747134]
We propose a self-regularized AutoAugment method to learn views for self-supervised vision transformers. First, we reduce the search cost of AutoView to nearly zero by learning views and network parameters simultaneously. We also present a curated augmentation policy search space for self-supervised learning.
arXiv Detail & Related papers (2022-10-16T06:20:44Z)
Vision Transformers and YoloV5 based Driver Drowsiness Detection Framework [0.0]
This paper introduces a novel framework based on vision transformers and YoloV5 architectures for driver drowsiness recognition. A custom YoloV5 pre-trained architecture is proposed for face extraction with the aim of extracting Region of Interest (ROI) For the further evaluation, proposed framework is tested on a custom dataset of 39 participants in various light circumstances and achieved 95.5% accuracy.
arXiv Detail & Related papers (2022-09-03T11:37:41Z)
One Million Scenes for Autonomous Driving: ONCE Dataset [91.94189514073354]
We introduce the ONCE dataset for 3D object detection in the autonomous driving scenario. The data is selected from 144 driving hours, which is 20x longer than the largest 3D autonomous driving dataset available. We reproduce and evaluate a variety of self-supervised and semi-supervised methods on the ONCE dataset.
arXiv Detail & Related papers (2021-06-21T12:28:08Z)
Fine-Grained Vehicle Perception via 3D Part-Guided Visual Data Augmentation [77.60050239225086]
We propose an effective training data generation process by fitting a 3D car model with dynamic parts to vehicles in real images. Our approach is fully automatic without any human interaction. We present a multi-task network for VUS parsing and a multi-stream network for VHI parsing.
arXiv Detail & Related papers (2020-12-15T03:03:38Z)
In-the-wild Drowsiness Detection from Facial Expressions [6.569756709977793]
Driving in a state of drowsiness is a major cause of road accidents, resulting in tremendous damage to life and property. We propose a data collection protocol that involves outfitting vehicles of overnight shift workers with camera kits that record their faces while driving. We experiment with different convolutional and temporal neural network architectures to predict drowsiness states from pose, expression and emotion-based representation of the input video of the driver's face.
arXiv Detail & Related papers (2020-10-21T17:28:56Z)
Towards Evaluating Driver Fatigue with Robust Deep Learning Models [0.0]
Drowsy driving results in approximately 72,000 crashes and 44,000 injuries every year in the US. We propose a framework to detect eye closedness in a captured camera frame as a gateway for detecting drowsiness.
arXiv Detail & Related papers (2020-07-16T16:44:49Z)
Learning Accurate and Human-Like Driving using Semantic Maps and Attention [152.48143666881418]
This paper investigates how end-to-end driving models can be improved to drive more accurately and human-like. We exploit semantic and visual maps from HERE Technologies and augment the existing Drive360 dataset with such. Our models are trained and evaluated on the Drive360 + HERE dataset, which features 60 hours and 3000 km of real-world driving data.
arXiv Detail & Related papers (2020-07-10T22:25:27Z)

This list is automatically generated from the titles and abstracts of the papers in this site.