Cognitive Accident Prediction in Driving Scenes: A Multimodality
Benchmark
- URL: http://arxiv.org/abs/2212.09381v2
- Date: Fri, 16 Jun 2023 13:29:45 GMT
- Title: Cognitive Accident Prediction in Driving Scenes: A Multimodality
Benchmark
- Authors: Jianwu Fang, Lei-Lei Li, Kuan Yang, Zhedong Zheng, Jianru Xue, and
Tat-Seng Chua
- Abstract summary: We propose a Cognitive Accident Prediction (CAP) method that explicitly leverages human-inspired cognition of text description on the visual observation and the driver attention to facilitate model training.
CAP is formulated by an attentive text-to-vision shift fusion module, an attentive scene context transfer module, and the driver attention guided accident prediction module.
We construct a new large-scale benchmark consisting of 11,727 in-the-wild accident videos with over 2.19 million frames.
- Score: 77.54411007883962
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Traffic accident prediction in driving videos aims to provide an early
warning of the accident occurrence, and supports the decision making of safe
driving systems. Previous works usually concentrate on the spatial-temporal
correlation of object-level context, while they do not fit the inherent
long-tailed data distribution well and are vulnerable to severe environmental
change. In this work, we propose a Cognitive Accident Prediction (CAP) method
that explicitly leverages human-inspired cognition of text description on the
visual observation and the driver attention to facilitate model training. In
particular, the text description provides a dense semantic description guidance
for the primary context of the traffic scene, while the driver attention
provides a traction to focus on the critical region closely correlating with
safe driving. CAP is formulated by an attentive text-to-vision shift fusion
module, an attentive scene context transfer module, and the driver attention
guided accident prediction module. We leverage the attention mechanism in these
modules to explore the core semantic cues for accident prediction. In order to
train CAP, we extend an existing self-collected DADA-2000 dataset (with
annotated driver attention for each frame) with further factual text
descriptions for the visual observations before the accidents. Besides, we
construct a new large-scale benchmark consisting of 11,727 in-the-wild accident
videos with over 2.19 million frames (named as CAP-DATA) together with labeled
fact-effect-reason-introspection description and temporal accident frame label.
Based on extensive experiments, the superiority of CAP is validated compared
with state-of-the-art approaches. The code, CAP-DATA, and all results will be
released in \url{https://github.com/JWFanggit/LOTVS-CAP}.
Related papers
- STDA: Spatio-Temporal Dual-Encoder Network Incorporating Driver Attention to Predict Driver Behaviors Under Safety-Critical Scenarios [11.303666834549896]
Driver attention was incorporated into a dual behavior-coder-en network named STDA for safety-critical scenarios.
STDA contains four parts: the driver attention prediction module, the fusion module designed to fuse the features between driver attention and raw images, and the temporary encoder module used to enhance the capability to interpret dynamic scenes.
The results show that STDA improves the G-mean from 0.659 to 0.719 when incorporating driver attention and adopting a temporal encoder module.
arXiv Detail & Related papers (2024-08-03T13:06:04Z) - Abductive Ego-View Accident Video Understanding for Safe Driving
Perception [75.60000661664556]
We present MM-AU, a novel dataset for Multi-Modal Accident video Understanding.
MM-AU contains 11,727 in-the-wild ego-view accident videos, each with temporally aligned text descriptions.
We present an Abductive accident Video understanding framework for Safe Driving perception (AdVersa-SD)
arXiv Detail & Related papers (2024-03-01T10:42:52Z) - CEMFormer: Learning to Predict Driver Intentions from In-Cabin and
External Cameras via Spatial-Temporal Transformers [5.572431452586636]
We introduce a new framework called Cross-View Episodic Memory Transformer (CEM)
CEM employs unified memory representations to learn for an improved driver intention prediction.
We propose a novel context-consistency loss that incorporates driving context as an auxiliary supervision signal to improve prediction performance.
arXiv Detail & Related papers (2023-05-13T05:27:36Z) - DeepAccident: A Motion and Accident Prediction Benchmark for V2X
Autonomous Driving [76.29141888408265]
We propose a large-scale dataset containing diverse accident scenarios that frequently occur in real-world driving.
The proposed DeepAccident dataset includes 57K annotated frames and 285K annotated samples, approximately 7 times more than the large-scale nuScenes dataset.
arXiv Detail & Related papers (2023-04-03T17:37:00Z) - CoCAtt: A Cognitive-Conditioned Driver Attention Dataset (Supplementary
Material) [31.888206001447625]
Driver attention prediction can play an instrumental role in mitigating and preventing high-risk events.
We present a new driver attention dataset, CoCAtt.
CoCAtt is the largest and the most diverse driver attention dataset in terms of autonomy levels, eye tracker resolutions, and driving scenarios.
arXiv Detail & Related papers (2022-07-08T17:35:17Z) - Safety-aware Motion Prediction with Unseen Vehicles for Autonomous
Driving [104.32241082170044]
We study a new task, safety-aware motion prediction with unseen vehicles for autonomous driving.
Unlike the existing trajectory prediction task for seen vehicles, we aim at predicting an occupancy map.
Our approach is the first one that can predict the existence of unseen vehicles in most cases.
arXiv Detail & Related papers (2021-09-03T13:33:33Z) - DRIVE: Deep Reinforced Accident Anticipation with Visual Explanation [36.350348194248014]
Traffic accident anticipation aims to accurately and promptly predict the occurrence of a future accident from dashcam videos.
Existing approaches typically focus on capturing the cues of spatial and temporal context before a future accident occurs.
We propose Deep ReInforced accident anticipation with Visual Explanation, named DRIVE.
arXiv Detail & Related papers (2021-07-21T16:33:21Z) - A model for traffic incident prediction using emergency braking data [77.34726150561087]
We address the fundamental problem of data scarcity in road traffic accident prediction by training our model on emergency braking events instead of accidents.
We present a prototype implementing a traffic incident prediction model for Germany based on emergency braking data from Mercedes-Benz vehicles.
arXiv Detail & Related papers (2021-02-12T18:17:12Z) - Uncertainty-based Traffic Accident Anticipation with Spatio-Temporal
Relational Learning [30.59728753059457]
Traffic accident anticipation aims to predict accidents from dashcam videos as early as possible.
Current deterministic deep neural networks could be overconfident in false predictions.
We propose an uncertainty-based accident anticipation model with relational-temporal learning.
arXiv Detail & Related papers (2020-08-01T20:21:48Z) - Driver Intention Anticipation Based on In-Cabin and Driving Scene
Monitoring [52.557003792696484]
We present a framework for the detection of the drivers' intention based on both in-cabin and traffic scene videos.
Our framework achieves a prediction with the accuracy of 83.98% and F1-score of 84.3%.
arXiv Detail & Related papers (2020-06-20T11:56:32Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.