Related papers: AIDE: A Vision-Driven Multi-View, Multi-Modal, Multi-Tasking Dataset for Assistive Driving Perception

AIDE: A Vision-Driven Multi-View, Multi-Modal, Multi-Tasking Dataset for Assistive Driving Perception

URL: http://arxiv.org/abs/2307.13933v2
Date: Tue, 1 Aug 2023 09:29:51 GMT
Title: AIDE: A Vision-Driven Multi-View, Multi-Modal, Multi-Tasking Dataset for Assistive Driving Perception
Authors: Dingkang Yang, Shuai Huang, Zhi Xu, Zhenpeng Li, Shunli Wang, Mingcheng Li, Yuzheng Wang, Yang Liu, Kun Yang, Zhaoyu Chen, Yan Wang, Jing Liu, Peixuan Zhang, Peng Zhai, Lihua Zhang
Abstract summary: We present an AssIstive Driving pErception dataset (AIDE) that considers context information both inside and outside the vehicle. AIDE facilitates holistic driver monitoring through three distinctive characteristics. Two fusion strategies are introduced to give new insights into learning effective multi-stream/modal representations.
Score: 26.84439405241999
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Driver distraction has become a significant cause of severe traffic accidents over the past decade. Despite the growing development of vision-driven driver monitoring systems, the lack of comprehensive perception datasets restricts road safety and traffic security. In this paper, we present an AssIstive Driving pErception dataset (AIDE) that considers context information both inside and outside the vehicle in naturalistic scenarios. AIDE facilitates holistic driver monitoring through three distinctive characteristics, including multi-view settings of driver and scene, multi-modal annotations of face, body, posture, and gesture, and four pragmatic task designs for driving understanding. To thoroughly explore AIDE, we provide experimental benchmarks on three kinds of baseline frameworks via extensive methods. Moreover, two fusion strategies are introduced to give new insights into learning effective multi-stream/modal representations. We also systematically investigate the importance and rationality of the key components in AIDE and benchmarks. The project link is https://github.com/ydk122024/AIDE.

Related papers

Visual Dominance and Emerging Multimodal Approaches in Distracted Driving Detection: A Review of Machine Learning Techniques [3.378738346115004]
Distracted driving continues to be a significant cause of road traffic injuries and fatalities worldwide.<n>Recent developments in machine learning (ML) and deep learning (DL) have primarily focused on visual data to detect distraction.<n>This systematic review assesses 74 studies that utilize ML/DL techniques for distracted driving detection across visual, sensor-based, multimodal, and emerging modalities.
arXiv Detail & Related papers (2025-05-04T02:51:00Z)
Towards Intelligent Transportation with Pedestrians and Vehicles In-the-Loop: A Surveillance Video-Assisted Federated Digital Twin Framework [62.47416496137193]
We propose a surveillance video assisted federated digital twin (SV-FDT) framework to empower ITSs with pedestrians and vehicles in-the-loop. The architecture consists of three layers: (i) the end layer, which collects traffic surveillance videos from multiple sources; (ii) the edge layer, responsible for semantic segmentation-based visual understanding, twin agent-based interaction modeling, and local digital twin system (LDTS) creation in local regions; and (iii) the cloud layer, which integrates LDTSs across different regions to construct a global DT model in realtime.
arXiv Detail & Related papers (2025-03-06T07:36:06Z)
Hints of Prompt: Enhancing Visual Representation for Multimodal LLMs in Autonomous Driving [65.04643267731122]
General MLLMs combined with CLIP often struggle to represent driving-specific scenarios accurately. We propose the Hints of Prompt (HoP) framework, which introduces three key enhancements. These hints are fused through a Hint Fusion module, enriching visual representations and enhancing multimodal reasoning.
arXiv Detail & Related papers (2024-11-20T06:58:33Z)
Towards Infusing Auxiliary Knowledge for Distracted Driver Detection [11.816566371802802]
Distracted driving is a leading cause of road accidents globally. We propose KiD3, a novel method for distracted driver detection (DDD) by infusing auxiliary knowledge about semantic relations between entities in a scene and the structural configuration of the driver's pose. Specifically, we construct a unified framework that integrates the scene graphs, and driver pose information with the visual cues in video frames to create a holistic representation of the driver's actions.
arXiv Detail & Related papers (2024-08-29T15:28:42Z)
DeepInteraction++: Multi-Modality Interaction for Autonomous Driving [80.8837864849534]
We introduce a novel modality interaction strategy that allows individual per-modality representations to be learned and maintained throughout. DeepInteraction++ is a multi-modal interaction framework characterized by a multi-modal representational interaction encoder and a multi-modal predictive interaction decoder. Experiments demonstrate the superior performance of the proposed framework on both 3D object detection and end-to-end autonomous driving tasks.
arXiv Detail & Related papers (2024-08-09T14:04:21Z)
CarDreamer: Open-Source Learning Platform for World Model based Autonomous Driving [25.49856190295859]
World model (WM) based reinforcement learning (RL) has emerged as a promising approach by learning and predicting the complex dynamics of various environments. There does not exist an accessible platform for training and testing such algorithms in sophisticated driving environments. We introduce CarDreamer, the first open-source learning platform designed specifically for developing WM based autonomous driving algorithms.
arXiv Detail & Related papers (2024-05-15T05:57:20Z)
Leveraging Driver Field-of-View for Multimodal Ego-Trajectory Prediction [69.29802752614677]
RouteFormer is a novel ego-trajectory prediction network combining GPS data, environmental context, and the driver's field-of-view. To tackle data scarcity and enhance diversity, we introduce GEM, a dataset of urban driving scenarios enriched with synchronized driver field-of-view and gaze data.
arXiv Detail & Related papers (2023-12-13T23:06:30Z)
TrafficMOT: A Challenging Dataset for Multi-Object Tracking in Complex Traffic Scenarios [23.831048188389026]
Multi-object tracking in traffic videos offers immense potential for enhancing traffic monitoring accuracy and promoting road safety measures. Existing datasets for multi-object tracking in traffic videos often feature limited instances or focus on single classes. We introduce TrafficMOT, an extensive dataset designed to encompass diverse traffic situations with complex scenarios.
arXiv Detail & Related papers (2023-11-30T18:59:56Z)
M$^2$DAR: Multi-View Multi-Scale Driver Action Recognition with Vision Transformer [5.082919518353888]
We present a multi-view, multi-scale framework for naturalistic driving action recognition and localization in untrimmed videos. Our system features a weight-sharing, multi-scale Transformer-based action recognition network that learns robust hierarchical representations.
arXiv Detail & Related papers (2023-05-13T02:38:15Z)
OpenLane-V2: A Topology Reasoning Benchmark for Unified 3D HD Mapping [84.65114565766596]
We present OpenLane-V2, the first dataset on topology reasoning for traffic scene structure. OpenLane-V2 consists of 2,000 annotated road scenes that describe traffic elements and their correlation to the lanes. We evaluate various state-of-the-art methods, and present their quantitative and qualitative results on OpenLane-V2 to indicate future avenues for investigating topology reasoning in traffic scenes.
arXiv Detail & Related papers (2023-04-20T16:31:22Z)
Federated Deep Learning Meets Autonomous Vehicle Perception: Design and Verification [168.67190934250868]
Federated learning empowered connected autonomous vehicle (FLCAV) has been proposed. FLCAV preserves privacy while reducing communication and annotation costs. It is challenging to determine the network resources and road sensor poses for multi-stage training.
arXiv Detail & Related papers (2022-06-03T23:55:45Z)
Fine-Grained Vehicle Perception via 3D Part-Guided Visual Data Augmentation [77.60050239225086]
We propose an effective training data generation process by fitting a 3D car model with dynamic parts to vehicles in real images. Our approach is fully automatic without any human interaction. We present a multi-task network for VUS parsing and a multi-stream network for VHI parsing.
arXiv Detail & Related papers (2020-12-15T03:03:38Z)
DMD: A Large-Scale Multi-Modal Driver Monitoring Dataset for Attention and Alertness Analysis [54.198237164152786]
Vision is the richest and most cost-effective technology for Driver Monitoring Systems (DMS) The lack of sufficiently large and comprehensive datasets is currently a bottleneck for the progress of DMS development. In this paper, we introduce the Driver Monitoring dataset (DMD), an extensive dataset which includes real and simulated driving scenarios.
arXiv Detail & Related papers (2020-08-27T12:33:54Z)

This list is automatically generated from the titles and abstracts of the papers in this site.