Applying Spatiotemporal Attention to Identify Distracted and Drowsy
Driving with Vision Transformers
- URL: http://arxiv.org/abs/2207.12148v1
- Date: Fri, 22 Jul 2022 16:36:48 GMT
- Title: Applying Spatiotemporal Attention to Identify Distracted and Drowsy
Driving with Vision Transformers
- Authors: Samay Lakhani
- Abstract summary: A 20% rise in car crashes in 2021 compared to 2020 has been observed as a result of increased distraction and drowsiness.
Drowsy and distracted driving are the cause of 45% of all car crashes.
This work investigated the use of the vision transformer to outperform state-of-the-art accuracy from 3D-CNNs.
- Score: 0.0
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: A 20% rise in car crashes in 2021 compared to 2020 has been observed as a
result of increased distraction and drowsiness. Drowsy and distracted driving
are the cause of 45% of all car crashes. As a means to decrease drowsy and
distracted driving, detection methods using computer vision can be designed to
be low-cost, accurate, and minimally invasive. This work investigated the use
of the vision transformer to outperform state-of-the-art accuracy from 3D-CNNs.
Two separate transformers were trained for drowsiness and distractedness. The
drowsy video transformer model was trained on the National Tsing-Hua University
Drowsy Driving Dataset (NTHU-DDD) with a Video Swin Transformer model for 10
epochs on two classes -- drowsy and non-drowsy simulated over 10.5 hours. The
distracted video transformer was trained on the Driver Monitoring Dataset (DMD)
with Video Swin Transformer for 50 epochs over 9 distraction-related classes.
The accuracy of the drowsiness model reached 44% and a high loss value on the
test set, indicating overfitting and poor model performance. Overfitting
indicates limited training data and applied model architecture lacked
quantifiable parameters to learn. The distracted model outperformed
state-of-the-art models on DMD reaching 97.5%, indicating that with sufficient
data and a strong architecture, transformers are suitable for unfit driving
detection. Future research should use newer and stronger models such as
TokenLearner to achieve higher accuracy and efficiency, merge existing datasets
to expand to detecting drunk driving and road rage to create a comprehensive
solution to prevent traffic crashes, and deploying a functioning prototype to
revolutionize the automotive safety industry.
Related papers
- Mitigating Covariate Shift in Imitation Learning for Autonomous Vehicles Using Latent Space Generative World Models [60.87795376541144]
A world model is a neural network capable of predicting an agent's next state given past states and actions.
During end-to-end training, our policy learns how to recover from errors by aligning with states observed in human demonstrations.
We present qualitative and quantitative results, demonstrating significant improvements upon prior state of the art in closed-loop testing.
arXiv Detail & Related papers (2024-09-25T06:48:25Z) - DRUformer: Enhancing the driving scene Important object detection with
driving relationship self-understanding [50.81809690183755]
Traffic accidents frequently lead to fatal injuries, contributing to over 50 million deaths until 2023.
Previous research primarily assessed the importance of individual participants, treating them as independent entities.
We introduce Driving scene Relationship self-Understanding transformer (DRUformer) to enhance the important object detection task.
arXiv Detail & Related papers (2023-11-11T07:26:47Z) - Fixating on Attention: Integrating Human Eye Tracking into Vision
Transformers [5.221681407166792]
This work demonstrates how human visual input, specifically fixations collected from an eye-tracking device, can be integrated into transformer models to improve accuracy across multiple driving situations and datasets.
We establish the significance of fixation regions in left-right driving decisions, as observed in both human subjects and a Vision Transformer (ViT)
We incorporate information from the driving scene with fixation data, employing a "joint space-fixation" (JSF) attention setup. Lastly, we propose a "fixation-attention intersection" (FAX) loss to train the ViT model to attend to the same regions that humans fixated on
arXiv Detail & Related papers (2023-08-26T22:48:06Z) - Robustness Benchmark of Road User Trajectory Prediction Models for
Automated Driving [0.0]
We benchmark machine learning models against perturbations that simulate functional insufficiencies observed during model deployment in a vehicle.
Training the models with similar perturbations effectively reduces performance degradation, with error increases of up to +87.5%.
We argue that despite being an effective mitigation strategy, data augmentation through perturbations during training does not guarantee robustness towards unforeseen perturbations.
arXiv Detail & Related papers (2023-04-04T15:47:42Z) - Learning Self-Regularized Adversarial Views for Self-Supervised Vision
Transformers [105.89564687747134]
We propose a self-regularized AutoAugment method to learn views for self-supervised vision transformers.
First, we reduce the search cost of AutoView to nearly zero by learning views and network parameters simultaneously.
We also present a curated augmentation policy search space for self-supervised learning.
arXiv Detail & Related papers (2022-10-16T06:20:44Z) - Vision Transformers and YoloV5 based Driver Drowsiness Detection
Framework [0.0]
This paper introduces a novel framework based on vision transformers and YoloV5 architectures for driver drowsiness recognition.
A custom YoloV5 pre-trained architecture is proposed for face extraction with the aim of extracting Region of Interest (ROI)
For the further evaluation, proposed framework is tested on a custom dataset of 39 participants in various light circumstances and achieved 95.5% accuracy.
arXiv Detail & Related papers (2022-09-03T11:37:41Z) - One Million Scenes for Autonomous Driving: ONCE Dataset [91.94189514073354]
We introduce the ONCE dataset for 3D object detection in the autonomous driving scenario.
The data is selected from 144 driving hours, which is 20x longer than the largest 3D autonomous driving dataset available.
We reproduce and evaluate a variety of self-supervised and semi-supervised methods on the ONCE dataset.
arXiv Detail & Related papers (2021-06-21T12:28:08Z) - Fine-Grained Vehicle Perception via 3D Part-Guided Visual Data
Augmentation [77.60050239225086]
We propose an effective training data generation process by fitting a 3D car model with dynamic parts to vehicles in real images.
Our approach is fully automatic without any human interaction.
We present a multi-task network for VUS parsing and a multi-stream network for VHI parsing.
arXiv Detail & Related papers (2020-12-15T03:03:38Z) - In-the-wild Drowsiness Detection from Facial Expressions [6.569756709977793]
Driving in a state of drowsiness is a major cause of road accidents, resulting in tremendous damage to life and property.
We propose a data collection protocol that involves outfitting vehicles of overnight shift workers with camera kits that record their faces while driving.
We experiment with different convolutional and temporal neural network architectures to predict drowsiness states from pose, expression and emotion-based representation of the input video of the driver's face.
arXiv Detail & Related papers (2020-10-21T17:28:56Z) - Towards Evaluating Driver Fatigue with Robust Deep Learning Models [0.0]
Drowsy driving results in approximately 72,000 crashes and 44,000 injuries every year in the US.
We propose a framework to detect eye closedness in a captured camera frame as a gateway for detecting drowsiness.
arXiv Detail & Related papers (2020-07-16T16:44:49Z) - Learning Accurate and Human-Like Driving using Semantic Maps and
Attention [152.48143666881418]
This paper investigates how end-to-end driving models can be improved to drive more accurately and human-like.
We exploit semantic and visual maps from HERE Technologies and augment the existing Drive360 dataset with such.
Our models are trained and evaluated on the Drive360 + HERE dataset, which features 60 hours and 3000 km of real-world driving data.
arXiv Detail & Related papers (2020-07-10T22:25:27Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.