Abductive Ego-View Accident Video Understanding for Safe Driving
Perception
- URL: http://arxiv.org/abs/2403.00436v1
- Date: Fri, 1 Mar 2024 10:42:52 GMT
- Title: Abductive Ego-View Accident Video Understanding for Safe Driving
Perception
- Authors: Jianwu Fang, Lei-lei Li, Junfei Zhou, Junbin Xiao, Hongkai Yu, Chen
Lv, Jianru Xue, and Tat-Seng Chua
- Abstract summary: We present MM-AU, a novel dataset for Multi-Modal Accident video Understanding.
MM-AU contains 11,727 in-the-wild ego-view accident videos, each with temporally aligned text descriptions.
We present an Abductive accident Video understanding framework for Safe Driving perception (AdVersa-SD)
- Score: 75.60000661664556
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: We present MM-AU, a novel dataset for Multi-Modal Accident video
Understanding. MM-AU contains 11,727 in-the-wild ego-view accident videos, each
with temporally aligned text descriptions. We annotate over 2.23 million object
boxes and 58,650 pairs of video-based accident reasons, covering 58 accident
categories. MM-AU supports various accident understanding tasks, particularly
multimodal video diffusion to understand accident cause-effect chains for safe
driving. With MM-AU, we present an Abductive accident Video understanding
framework for Safe Driving perception (AdVersa-SD). AdVersa-SD performs video
diffusion via an Object-Centric Video Diffusion (OAVD) method which is driven
by an abductive CLIP model. This model involves a contrastive interaction loss
to learn the pair co-occurrence of normal, near-accident, accident frames with
the corresponding text descriptions, such as accident reasons, prevention
advice, and accident categories. OAVD enforces the causal region learning while
fixing the content of the original frame background in video generation, to
find the dominant cause-effect chain for certain accidents. Extensive
experiments verify the abductive ability of AdVersa-SD and the superiority of
OAVD against the state-of-the-art diffusion models. Additionally, we provide
careful benchmark evaluations for object detection and accident reason
answering since AdVersa-SD relies on precise object and accident reason
information.
Related papers
- Enhancing Vision-Language Models with Scene Graphs for Traffic Accident Understanding [45.7444555195196]
This work introduces a multi-stage, multimodal pipeline to pre-process videos of traffic accidents, encode them as scene graphs, and align this representation with vision and language modalities for accident classification.
When trained on 4 classes, our method achieves a balanced accuracy score of 57.77% on an (unbalanced) subset of the popular Detection of Traffic Anomaly benchmark.
arXiv Detail & Related papers (2024-07-08T13:15:11Z) - Learning Traffic Crashes as Language: Datasets, Benchmarks, and What-if Causal Analyses [76.59021017301127]
We propose a large-scale traffic crash language dataset, named CrashEvent, summarizing 19,340 real-world crash reports.
We further formulate the crash event feature learning as a novel text reasoning problem and further fine-tune various large language models (LLMs) to predict detailed accident outcomes.
Our experiments results show that our LLM-based approach not only predicts the severity of accidents but also classifies different types of accidents and predicts injury outcomes.
arXiv Detail & Related papers (2024-06-16T03:10:16Z) - ACAV: A Framework for Automatic Causality Analysis in Autonomous Vehicle
Accident Recordings [5.578446693797519]
Recent fatalities have emphasized the importance of safety validation through large-scale testing.
We propose ACAV, an automated framework designed to conduct causality analysis for AV accident recordings.
We evaluate ACAV on the Apollo ADS, finding that it can identify five distinct types of causal events in 93.64% of 110 accident recordings.
arXiv Detail & Related papers (2024-01-13T12:41:05Z) - A Memory-Augmented Multi-Task Collaborative Framework for Unsupervised
Traffic Accident Detection in Driving Videos [22.553356096143734]
We propose a novel memory-augmented multi-task collaborative framework (MAMTCF) for unsupervised traffic accident detection in driving videos.
Our method can more accurately detect both ego-involved and non-ego accidents by simultaneously modeling appearance changes and object motions in video frames.
arXiv Detail & Related papers (2023-07-27T01:45:13Z) - DeepAccident: A Motion and Accident Prediction Benchmark for V2X
Autonomous Driving [76.29141888408265]
We propose a large-scale dataset containing diverse accident scenarios that frequently occur in real-world driving.
The proposed DeepAccident dataset includes 57K annotated frames and 285K annotated samples, approximately 7 times more than the large-scale nuScenes dataset.
arXiv Detail & Related papers (2023-04-03T17:37:00Z) - Augmenting Ego-Vehicle for Traffic Near-Miss and Accident Classification
Dataset using Manipulating Conditional Style Translation [0.3441021278275805]
There is no difference between accident and near-miss at the time before the accident happened.
Our contribution is to redefine the accident definition and re-annotate the accident inconsistency on DADA-2000 dataset together with near-miss.
The proposed method integrates two different components: conditional style translation (CST) and separable 3-dimensional convolutional neural network (S3D)
arXiv Detail & Related papers (2023-01-06T22:04:47Z) - Cognitive Accident Prediction in Driving Scenes: A Multimodality
Benchmark [77.54411007883962]
We propose a Cognitive Accident Prediction (CAP) method that explicitly leverages human-inspired cognition of text description on the visual observation and the driver attention to facilitate model training.
CAP is formulated by an attentive text-to-vision shift fusion module, an attentive scene context transfer module, and the driver attention guided accident prediction module.
We construct a new large-scale benchmark consisting of 11,727 in-the-wild accident videos with over 2.19 million frames.
arXiv Detail & Related papers (2022-12-19T11:43:02Z) - An Attention-guided Multistream Feature Fusion Network for Localization
of Risky Objects in Driving Videos [10.674638266121574]
This paper proposes an attention-guided multistream feature fusion network (AM-Net) to localize dangerous traffic agents from dashcam videos.
Two Gated Recurrent Unit (GRU) networks use object bounding box and optical flow features extracted from consecutive video frames to capturetemporal cues for distinguishing dangerous traffic agents.
Fusing the two streams of features, AM-Net predicts the riskiness scores of traffic agents in the video.
arXiv Detail & Related papers (2022-09-16T13:36:28Z) - Driver Intention Anticipation Based on In-Cabin and Driving Scene
Monitoring [52.557003792696484]
We present a framework for the detection of the drivers' intention based on both in-cabin and traffic scene videos.
Our framework achieves a prediction with the accuracy of 83.98% and F1-score of 84.3%.
arXiv Detail & Related papers (2020-06-20T11:56:32Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.