AIDE: A Vision-Driven Multi-View, Multi-Modal, Multi-Tasking Dataset for
Assistive Driving Perception
- URL: http://arxiv.org/abs/2307.13933v2
- Date: Tue, 1 Aug 2023 09:29:51 GMT
- Title: AIDE: A Vision-Driven Multi-View, Multi-Modal, Multi-Tasking Dataset for
Assistive Driving Perception
- Authors: Dingkang Yang, Shuai Huang, Zhi Xu, Zhenpeng Li, Shunli Wang,
Mingcheng Li, Yuzheng Wang, Yang Liu, Kun Yang, Zhaoyu Chen, Yan Wang, Jing
Liu, Peixuan Zhang, Peng Zhai, Lihua Zhang
- Abstract summary: We present an AssIstive Driving pErception dataset (AIDE) that considers context information both inside and outside the vehicle.
AIDE facilitates holistic driver monitoring through three distinctive characteristics.
Two fusion strategies are introduced to give new insights into learning effective multi-stream/modal representations.
- Score: 26.84439405241999
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Driver distraction has become a significant cause of severe traffic accidents
over the past decade. Despite the growing development of vision-driven driver
monitoring systems, the lack of comprehensive perception datasets restricts
road safety and traffic security. In this paper, we present an AssIstive
Driving pErception dataset (AIDE) that considers context information both
inside and outside the vehicle in naturalistic scenarios. AIDE facilitates
holistic driver monitoring through three distinctive characteristics, including
multi-view settings of driver and scene, multi-modal annotations of face, body,
posture, and gesture, and four pragmatic task designs for driving
understanding. To thoroughly explore AIDE, we provide experimental benchmarks
on three kinds of baseline frameworks via extensive methods. Moreover, two
fusion strategies are introduced to give new insights into learning effective
multi-stream/modal representations. We also systematically investigate the
importance and rationality of the key components in AIDE and benchmarks. The
project link is https://github.com/ydk122024/AIDE.
Related papers
- CarDreamer: Open-Source Learning Platform for World Model based Autonomous Driving [25.49856190295859]
World model (WM) based reinforcement learning (RL) has emerged as a promising approach by learning and predicting the complex dynamics of various environments.
There does not exist an accessible platform for training and testing such algorithms in sophisticated driving environments.
We introduce CarDreamer, the first open-source learning platform designed specifically for developing WM based autonomous driving algorithms.
arXiv Detail & Related papers (2024-05-15T05:57:20Z) - Delving into Multi-modal Multi-task Foundation Models for Road Scene Understanding: From Learning Paradigm Perspectives [56.2139730920855]
We present a systematic analysis of MM-VUFMs specifically designed for road scenes.
Our objective is to provide a comprehensive overview of common practices, referring to task-specific models, unified multi-modal models, unified multi-task models, and foundation model prompting techniques.
We provide insights into key challenges and future trends, such as closed-loop driving systems, interpretability, embodied driving agents, and world models.
arXiv Detail & Related papers (2024-02-05T12:47:09Z) - TrafficMOT: A Challenging Dataset for Multi-Object Tracking in Complex
Traffic Scenarios [23.831048188389026]
Multi-object tracking in traffic videos offers immense potential for enhancing traffic monitoring accuracy and promoting road safety measures.
Existing datasets for multi-object tracking in traffic videos often feature limited instances or focus on single classes.
We introduce TrafficMOT, an extensive dataset designed to encompass diverse traffic situations with complex scenarios.
arXiv Detail & Related papers (2023-11-30T18:59:56Z) - LiDAR-BEVMTN: Real-Time LiDAR Bird's-Eye View Multi-Task Perception
Network for Autonomous Driving [7.137567622606353]
We present a real-time multi-task convolutional neural network for LiDAR-based object detection, semantics, and motion segmentation.
We propose a novel Semantic Weighting and Guidance (SWAG) module to transfer semantic features for improved object detection selectively.
We achieve state-of-the-art results for two tasks, semantic and motion segmentation, and close to state-of-the-art performance for 3D object detection.
arXiv Detail & Related papers (2023-07-17T21:22:17Z) - M$^2$DAR: Multi-View Multi-Scale Driver Action Recognition with Vision
Transformer [5.082919518353888]
We present a multi-view, multi-scale framework for naturalistic driving action recognition and localization in untrimmed videos.
Our system features a weight-sharing, multi-scale Transformer-based action recognition network that learns robust hierarchical representations.
arXiv Detail & Related papers (2023-05-13T02:38:15Z) - OpenLane-V2: A Topology Reasoning Benchmark for Unified 3D HD Mapping [84.65114565766596]
We present OpenLane-V2, the first dataset on topology reasoning for traffic scene structure.
OpenLane-V2 consists of 2,000 annotated road scenes that describe traffic elements and their correlation to the lanes.
We evaluate various state-of-the-art methods, and present their quantitative and qualitative results on OpenLane-V2 to indicate future avenues for investigating topology reasoning in traffic scenes.
arXiv Detail & Related papers (2023-04-20T16:31:22Z) - Visual Exemplar Driven Task-Prompting for Unified Perception in
Autonomous Driving [100.3848723827869]
We present an effective multi-task framework, VE-Prompt, which introduces visual exemplars via task-specific prompting.
Specifically, we generate visual exemplars based on bounding boxes and color-based markers, which provide accurate visual appearances of target categories.
We bridge transformer-based encoders and convolutional layers for efficient and accurate unified perception in autonomous driving.
arXiv Detail & Related papers (2023-03-03T08:54:06Z) - Federated Deep Learning Meets Autonomous Vehicle Perception: Design and
Verification [168.67190934250868]
Federated learning empowered connected autonomous vehicle (FLCAV) has been proposed.
FLCAV preserves privacy while reducing communication and annotation costs.
It is challenging to determine the network resources and road sensor poses for multi-stage training.
arXiv Detail & Related papers (2022-06-03T23:55:45Z) - Fine-Grained Vehicle Perception via 3D Part-Guided Visual Data
Augmentation [77.60050239225086]
We propose an effective training data generation process by fitting a 3D car model with dynamic parts to vehicles in real images.
Our approach is fully automatic without any human interaction.
We present a multi-task network for VUS parsing and a multi-stream network for VHI parsing.
arXiv Detail & Related papers (2020-12-15T03:03:38Z) - DMD: A Large-Scale Multi-Modal Driver Monitoring Dataset for Attention
and Alertness Analysis [54.198237164152786]
Vision is the richest and most cost-effective technology for Driver Monitoring Systems (DMS)
The lack of sufficiently large and comprehensive datasets is currently a bottleneck for the progress of DMS development.
In this paper, we introduce the Driver Monitoring dataset (DMD), an extensive dataset which includes real and simulated driving scenarios.
arXiv Detail & Related papers (2020-08-27T12:33:54Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.