Cognitive Accident Prediction in Driving Scenes: A Multimodality
Benchmark
- URL: http://arxiv.org/abs/2212.09381v2
- Date: Fri, 16 Jun 2023 13:29:45 GMT
- Title: Cognitive Accident Prediction in Driving Scenes: A Multimodality
Benchmark
- Authors: Jianwu Fang, Lei-Lei Li, Kuan Yang, Zhedong Zheng, Jianru Xue, and
Tat-Seng Chua
- Abstract summary: We propose a Cognitive Accident Prediction (CAP) method that explicitly leverages human-inspired cognition of text description on the visual observation and the driver attention to facilitate model training.
CAP is formulated by an attentive text-to-vision shift fusion module, an attentive scene context transfer module, and the driver attention guided accident prediction module.
We construct a new large-scale benchmark consisting of 11,727 in-the-wild accident videos with over 2.19 million frames.
- Score: 77.54411007883962
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Traffic accident prediction in driving videos aims to provide an early
warning of the accident occurrence, and supports the decision making of safe
driving systems. Previous works usually concentrate on the spatial-temporal
correlation of object-level context, while they do not fit the inherent
long-tailed data distribution well and are vulnerable to severe environmental
change. In this work, we propose a Cognitive Accident Prediction (CAP) method
that explicitly leverages human-inspired cognition of text description on the
visual observation and the driver attention to facilitate model training. In
particular, the text description provides a dense semantic description guidance
for the primary context of the traffic scene, while the driver attention
provides a traction to focus on the critical region closely correlating with
safe driving. CAP is formulated by an attentive text-to-vision shift fusion
module, an attentive scene context transfer module, and the driver attention
guided accident prediction module. We leverage the attention mechanism in these
modules to explore the core semantic cues for accident prediction. In order to
train CAP, we extend an existing self-collected DADA-2000 dataset (with
annotated driver attention for each frame) with further factual text
descriptions for the visual observations before the accidents. Besides, we
construct a new large-scale benchmark consisting of 11,727 in-the-wild accident
videos with over 2.19 million frames (named as CAP-DATA) together with labeled
fact-effect-reason-introspection description and temporal accident frame label.
Based on extensive experiments, the superiority of CAP is validated compared
with state-of-the-art approaches. The code, CAP-DATA, and all results will be
released in \url{https://github.com/JWFanggit/LOTVS-CAP}.
Related papers
Err
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.