AIDE: A Vision-Driven Multi-View, Multi-Modal, Multi-Tasking Dataset for
Assistive Driving Perception
- URL: http://arxiv.org/abs/2307.13933v2
- Date: Tue, 1 Aug 2023 09:29:51 GMT
- Title: AIDE: A Vision-Driven Multi-View, Multi-Modal, Multi-Tasking Dataset for
Assistive Driving Perception
- Authors: Dingkang Yang, Shuai Huang, Zhi Xu, Zhenpeng Li, Shunli Wang,
Mingcheng Li, Yuzheng Wang, Yang Liu, Kun Yang, Zhaoyu Chen, Yan Wang, Jing
Liu, Peixuan Zhang, Peng Zhai, Lihua Zhang
- Abstract summary: We present an AssIstive Driving pErception dataset (AIDE) that considers context information both inside and outside the vehicle.
AIDE facilitates holistic driver monitoring through three distinctive characteristics.
Two fusion strategies are introduced to give new insights into learning effective multi-stream/modal representations.
- Score: 26.84439405241999
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Driver distraction has become a significant cause of severe traffic accidents
over the past decade. Despite the growing development of vision-driven driver
monitoring systems, the lack of comprehensive perception datasets restricts
road safety and traffic security. In this paper, we present an AssIstive
Driving pErception dataset (AIDE) that considers context information both
inside and outside the vehicle in naturalistic scenarios. AIDE facilitates
holistic driver monitoring through three distinctive characteristics, including
multi-view settings of driver and scene, multi-modal annotations of face, body,
posture, and gesture, and four pragmatic task designs for driving
understanding. To thoroughly explore AIDE, we provide experimental benchmarks
on three kinds of baseline frameworks via extensive methods. Moreover, two
fusion strategies are introduced to give new insights into learning effective
multi-stream/modal representations. We also systematically investigate the
importance and rationality of the key components in AIDE and benchmarks. The
project link is https://github.com/ydk122024/AIDE.
Related papers
- Hints of Prompt: Enhancing Visual Representation for Multimodal LLMs in Autonomous Driving [65.04643267731122]
General MLLMs combined with CLIP often struggle to represent driving-specific scenarios accurately.
We propose the Hints of Prompt (HoP) framework, which introduces three key enhancements.
These hints are fused through a Hint Fusion module, enriching visual representations and enhancing multimodal reasoning.
arXiv Detail & Related papers (2024-11-20T06:58:33Z) - Towards Infusing Auxiliary Knowledge for Distracted Driver Detection [11.816566371802802]
Distracted driving is a leading cause of road accidents globally.
We propose KiD3, a novel method for distracted driver detection (DDD) by infusing auxiliary knowledge about semantic relations between entities in a scene and the structural configuration of the driver's pose.
Specifically, we construct a unified framework that integrates the scene graphs, and driver pose information with the visual cues in video frames to create a holistic representation of the driver's actions.
arXiv Detail & Related papers (2024-08-29T15:28:42Z) - DeepInteraction++: Multi-Modality Interaction for Autonomous Driving [80.8837864849534]
We introduce a novel modality interaction strategy that allows individual per-modality representations to be learned and maintained throughout.
DeepInteraction++ is a multi-modal interaction framework characterized by a multi-modal representational interaction encoder and a multi-modal predictive interaction decoder.
Experiments demonstrate the superior performance of the proposed framework on both 3D object detection and end-to-end autonomous driving tasks.
arXiv Detail & Related papers (2024-08-09T14:04:21Z) - CarDreamer: Open-Source Learning Platform for World Model based Autonomous Driving [25.49856190295859]
World model (WM) based reinforcement learning (RL) has emerged as a promising approach by learning and predicting the complex dynamics of various environments.
There does not exist an accessible platform for training and testing such algorithms in sophisticated driving environments.
We introduce CarDreamer, the first open-source learning platform designed specifically for developing WM based autonomous driving algorithms.
arXiv Detail & Related papers (2024-05-15T05:57:20Z) - TrafficMOT: A Challenging Dataset for Multi-Object Tracking in Complex
Traffic Scenarios [23.831048188389026]
Multi-object tracking in traffic videos offers immense potential for enhancing traffic monitoring accuracy and promoting road safety measures.
Existing datasets for multi-object tracking in traffic videos often feature limited instances or focus on single classes.
We introduce TrafficMOT, an extensive dataset designed to encompass diverse traffic situations with complex scenarios.
arXiv Detail & Related papers (2023-11-30T18:59:56Z) - M$^2$DAR: Multi-View Multi-Scale Driver Action Recognition with Vision
Transformer [5.082919518353888]
We present a multi-view, multi-scale framework for naturalistic driving action recognition and localization in untrimmed videos.
Our system features a weight-sharing, multi-scale Transformer-based action recognition network that learns robust hierarchical representations.
arXiv Detail & Related papers (2023-05-13T02:38:15Z) - OpenLane-V2: A Topology Reasoning Benchmark for Unified 3D HD Mapping [84.65114565766596]
We present OpenLane-V2, the first dataset on topology reasoning for traffic scene structure.
OpenLane-V2 consists of 2,000 annotated road scenes that describe traffic elements and their correlation to the lanes.
We evaluate various state-of-the-art methods, and present their quantitative and qualitative results on OpenLane-V2 to indicate future avenues for investigating topology reasoning in traffic scenes.
arXiv Detail & Related papers (2023-04-20T16:31:22Z) - Federated Deep Learning Meets Autonomous Vehicle Perception: Design and
Verification [168.67190934250868]
Federated learning empowered connected autonomous vehicle (FLCAV) has been proposed.
FLCAV preserves privacy while reducing communication and annotation costs.
It is challenging to determine the network resources and road sensor poses for multi-stage training.
arXiv Detail & Related papers (2022-06-03T23:55:45Z) - Fine-Grained Vehicle Perception via 3D Part-Guided Visual Data
Augmentation [77.60050239225086]
We propose an effective training data generation process by fitting a 3D car model with dynamic parts to vehicles in real images.
Our approach is fully automatic without any human interaction.
We present a multi-task network for VUS parsing and a multi-stream network for VHI parsing.
arXiv Detail & Related papers (2020-12-15T03:03:38Z) - DMD: A Large-Scale Multi-Modal Driver Monitoring Dataset for Attention
and Alertness Analysis [54.198237164152786]
Vision is the richest and most cost-effective technology for Driver Monitoring Systems (DMS)
The lack of sufficiently large and comprehensive datasets is currently a bottleneck for the progress of DMS development.
In this paper, we introduce the Driver Monitoring dataset (DMD), an extensive dataset which includes real and simulated driving scenarios.
arXiv Detail & Related papers (2020-08-27T12:33:54Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.