PoseViNet: Distracted Driver Action Recognition Framework Using
Multi-View Pose Estimation and Vision Transformer
- URL: http://arxiv.org/abs/2312.14577v1
- Date: Fri, 22 Dec 2023 10:13:10 GMT
- Title: PoseViNet: Distracted Driver Action Recognition Framework Using
Multi-View Pose Estimation and Vision Transformer
- Authors: Neha Sengar, Indra Kumari, Jihui Lee, Dongsoo Har
- Abstract summary: This paper introduces a novel method for detection of driver distraction using multi-view driver action images.
The proposed method is a vision transformer-based framework with pose estimation and action inference, namely PoseViNet.
The PoseViNet achieves 97.55% validation accuracy and 90.92% testing accuracy with the challenging dataset.
- Score: 1.319058156672392
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Driver distraction is a principal cause of traffic accidents. In a study
conducted by the National Highway Traffic Safety Administration, engaging in
activities such as interacting with in-car menus, consuming food or beverages,
or engaging in telephonic conversations while operating a vehicle can be
significant sources of driver distraction. From this viewpoint, this paper
introduces a novel method for detection of driver distraction using multi-view
driver action images. The proposed method is a vision transformer-based
framework with pose estimation and action inference, namely PoseViNet. The
motivation for adding posture information is to enable the transformer to focus
more on key features. As a result, the framework is more adept at identifying
critical actions. The proposed framework is compared with various
state-of-the-art models using SFD3 dataset representing 10 behaviors of
drivers. It is found from the comparison that the PoseViNet outperforms these
models. The proposed framework is also evaluated with the SynDD1 dataset
representing 16 behaviors of driver. As a result, the PoseViNet achieves 97.55%
validation accuracy and 90.92% testing accuracy with the challenging dataset.
Related papers
- Cross-Camera Distracted Driver Classification through Feature Disentanglement and Contrastive Learning [13.613407983544427]
We introduce a robust model designed to withstand changes in camera position within the vehicle.
Our Driver Behavior Monitoring Network (DBMNet) relies on a lightweight backbone and integrates a disentanglement module.
Experiments conducted on the daytime and nighttime subsets of the 100-Driver dataset validate the effectiveness of our approach.
arXiv Detail & Related papers (2024-11-20T10:27:12Z) - Towards Infusing Auxiliary Knowledge for Distracted Driver Detection [11.816566371802802]
Distracted driving is a leading cause of road accidents globally.
We propose KiD3, a novel method for distracted driver detection (DDD) by infusing auxiliary knowledge about semantic relations between entities in a scene and the structural configuration of the driver's pose.
Specifically, we construct a unified framework that integrates the scene graphs, and driver pose information with the visual cues in video frames to create a holistic representation of the driver's actions.
arXiv Detail & Related papers (2024-08-29T15:28:42Z) - DRUformer: Enhancing the driving scene Important object detection with
driving relationship self-understanding [50.81809690183755]
Traffic accidents frequently lead to fatal injuries, contributing to over 50 million deaths until 2023.
Previous research primarily assessed the importance of individual participants, treating them as independent entities.
We introduce Driving scene Relationship self-Understanding transformer (DRUformer) to enhance the important object detection task.
arXiv Detail & Related papers (2023-11-11T07:26:47Z) - Markov Switching Model for Driver Behavior Prediction: Use cases on
Smartphones [4.576379639081977]
We present a driver behavior switching model validated by a low-cost data collection solution using smartphones.
The proposed model is validated using a real dataset to predict the driver behavior in short duration periods.
arXiv Detail & Related papers (2021-08-29T09:54:05Z) - Multi-Modal Fusion Transformer for End-to-End Autonomous Driving [59.60483620730437]
We propose TransFuser, a novel Multi-Modal Fusion Transformer, to integrate image and LiDAR representations using attention.
Our approach achieves state-of-the-art driving performance while reducing collisions by 76% compared to geometry-based fusion.
arXiv Detail & Related papers (2021-04-19T11:48:13Z) - The Multimodal Driver Monitoring Database: A Naturalistic Corpus to
Study Driver Attention [44.94118128276982]
A smart vehicle should be able to monitor the actions and behaviors of the human driver to provide critical warnings or intervene when necessary.
Recent advancements in deep learning and computer vision have shown great promise in monitoring human behaviors and activities.
A vast amount of in-domain data is required to train models that provide high performance in predicting driving related tasks.
arXiv Detail & Related papers (2020-12-23T16:37:17Z) - Fine-Grained Vehicle Perception via 3D Part-Guided Visual Data
Augmentation [77.60050239225086]
We propose an effective training data generation process by fitting a 3D car model with dynamic parts to vehicles in real images.
Our approach is fully automatic without any human interaction.
We present a multi-task network for VUS parsing and a multi-stream network for VHI parsing.
arXiv Detail & Related papers (2020-12-15T03:03:38Z) - Studying Person-Specific Pointing and Gaze Behavior for Multimodal
Referencing of Outside Objects from a Moving Vehicle [58.720142291102135]
Hand pointing and eye gaze have been extensively investigated in automotive applications for object selection and referencing.
Existing outside-the-vehicle referencing methods focus on a static situation, whereas the situation in a moving vehicle is highly dynamic and subject to safety-critical constraints.
We investigate the specific characteristics of each modality and the interaction between them when used in the task of referencing outside objects.
arXiv Detail & Related papers (2020-09-23T14:56:19Z) - VehicleNet: Learning Robust Visual Representation for Vehicle
Re-identification [116.1587709521173]
We propose to build a large-scale vehicle dataset (called VehicleNet) by harnessing four public vehicle datasets.
We design a simple yet effective two-stage progressive approach to learning more robust visual representation from VehicleNet.
We achieve the state-of-art accuracy of 86.07% mAP on the private test set of AICity Challenge.
arXiv Detail & Related papers (2020-04-14T05:06:38Z) - Parsing-based View-aware Embedding Network for Vehicle Re-Identification [138.11983486734576]
We propose a parsing-based view-aware embedding network (PVEN) to achieve the view-aware feature alignment and enhancement for vehicle ReID.
The experiments conducted on three datasets show that our model outperforms state-of-the-art methods by a large margin.
arXiv Detail & Related papers (2020-04-10T13:06:09Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.