Abductive Ego-View Accident Video Understanding for Safe Driving
Perception
- URL: http://arxiv.org/abs/2403.00436v1
- Date: Fri, 1 Mar 2024 10:42:52 GMT
- Title: Abductive Ego-View Accident Video Understanding for Safe Driving
Perception
- Authors: Jianwu Fang, Lei-lei Li, Junfei Zhou, Junbin Xiao, Hongkai Yu, Chen
Lv, Jianru Xue, and Tat-Seng Chua
- Abstract summary: We present MM-AU, a novel dataset for Multi-Modal Accident video Understanding.
MM-AU contains 11,727 in-the-wild ego-view accident videos, each with temporally aligned text descriptions.
We present an Abductive accident Video understanding framework for Safe Driving perception (AdVersa-SD)
- Score: 75.60000661664556
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: We present MM-AU, a novel dataset for Multi-Modal Accident video
Understanding. MM-AU contains 11,727 in-the-wild ego-view accident videos, each
with temporally aligned text descriptions. We annotate over 2.23 million object
boxes and 58,650 pairs of video-based accident reasons, covering 58 accident
categories. MM-AU supports various accident understanding tasks, particularly
multimodal video diffusion to understand accident cause-effect chains for safe
driving. With MM-AU, we present an Abductive accident Video understanding
framework for Safe Driving perception (AdVersa-SD). AdVersa-SD performs video
diffusion via an Object-Centric Video Diffusion (OAVD) method which is driven
by an abductive CLIP model. This model involves a contrastive interaction loss
to learn the pair co-occurrence of normal, near-accident, accident frames with
the corresponding text descriptions, such as accident reasons, prevention
advice, and accident categories. OAVD enforces the causal region learning while
fixing the content of the original frame background in video generation, to
find the dominant cause-effect chain for certain accidents. Extensive
experiments verify the abductive ability of AdVersa-SD and the superiority of
OAVD against the state-of-the-art diffusion models. Additionally, we provide
careful benchmark evaluations for object detection and accident reason
answering since AdVersa-SD relies on precise object and accident reason
information.
Related papers
- AVD2: Accident Video Diffusion for Accident Video Description [11.221276595088215]
We introduce AVD2 (Accident Video Diffusion for Accident Video Description), a novel framework that enhances accident scene understanding.
The framework generates accident videos that align with detailed natural language descriptions and reasoning.
The integration of the EMM-AU dataset establishes state-of-the-art performance across both automated metrics and human evaluations.
arXiv Detail & Related papers (2025-02-20T18:22:44Z) - DAVE: Diverse Atomic Visual Elements Dataset with High Representation of Vulnerable Road Users in Complex and Unpredictable Environments [60.69159598130235]
We present a new dataset, DAVE, designed for evaluating perception methods with high representation of Vulnerable Road Users (VRUs)
DAVE is a manually annotated dataset encompassing 16 diverse actor categories (spanning animals, humans, vehicles, etc.) and 16 action types (complex and rare cases like cut-ins, zigzag movement, U-turn, etc.)
Our experiments show that existing methods suffer degradation in performance when evaluated on DAVE, highlighting its benefit for future video recognition research.
arXiv Detail & Related papers (2024-12-28T06:13:44Z) - Learning Traffic Crashes as Language: Datasets, Benchmarks, and What-if Causal Analyses [76.59021017301127]
We propose a large-scale traffic crash language dataset, named CrashEvent, summarizing 19,340 real-world crash reports.
We further formulate the crash event feature learning as a novel text reasoning problem and further fine-tune various large language models (LLMs) to predict detailed accident outcomes.
Our experiments results show that our LLM-based approach not only predicts the severity of accidents but also classifies different types of accidents and predicts injury outcomes.
arXiv Detail & Related papers (2024-06-16T03:10:16Z) - ACAV: A Framework for Automatic Causality Analysis in Autonomous Vehicle
Accident Recordings [5.578446693797519]
Recent fatalities have emphasized the importance of safety validation through large-scale testing.
We propose ACAV, an automated framework designed to conduct causality analysis for AV accident recordings.
We evaluate ACAV on the Apollo ADS, finding that it can identify five distinct types of causal events in 93.64% of 110 accident recordings.
arXiv Detail & Related papers (2024-01-13T12:41:05Z) - DeepAccident: A Motion and Accident Prediction Benchmark for V2X
Autonomous Driving [76.29141888408265]
We propose a large-scale dataset containing diverse accident scenarios that frequently occur in real-world driving.
The proposed DeepAccident dataset includes 57K annotated frames and 285K annotated samples, approximately 7 times more than the large-scale nuScenes dataset.
arXiv Detail & Related papers (2023-04-03T17:37:00Z) - Augmenting Ego-Vehicle for Traffic Near-Miss and Accident Classification
Dataset using Manipulating Conditional Style Translation [0.3441021278275805]
There is no difference between accident and near-miss at the time before the accident happened.
Our contribution is to redefine the accident definition and re-annotate the accident inconsistency on DADA-2000 dataset together with near-miss.
The proposed method integrates two different components: conditional style translation (CST) and separable 3-dimensional convolutional neural network (S3D)
arXiv Detail & Related papers (2023-01-06T22:04:47Z) - Cognitive Accident Prediction in Driving Scenes: A Multimodality
Benchmark [77.54411007883962]
We propose a Cognitive Accident Prediction (CAP) method that explicitly leverages human-inspired cognition of text description on the visual observation and the driver attention to facilitate model training.
CAP is formulated by an attentive text-to-vision shift fusion module, an attentive scene context transfer module, and the driver attention guided accident prediction module.
We construct a new large-scale benchmark consisting of 11,727 in-the-wild accident videos with over 2.19 million frames.
arXiv Detail & Related papers (2022-12-19T11:43:02Z) - An Attention-guided Multistream Feature Fusion Network for Localization
of Risky Objects in Driving Videos [10.674638266121574]
This paper proposes an attention-guided multistream feature fusion network (AM-Net) to localize dangerous traffic agents from dashcam videos.
Two Gated Recurrent Unit (GRU) networks use object bounding box and optical flow features extracted from consecutive video frames to capturetemporal cues for distinguishing dangerous traffic agents.
Fusing the two streams of features, AM-Net predicts the riskiness scores of traffic agents in the video.
arXiv Detail & Related papers (2022-09-16T13:36:28Z) - Driver Intention Anticipation Based on In-Cabin and Driving Scene
Monitoring [52.557003792696484]
We present a framework for the detection of the drivers' intention based on both in-cabin and traffic scene videos.
Our framework achieves a prediction with the accuracy of 83.98% and F1-score of 84.3%.
arXiv Detail & Related papers (2020-06-20T11:56:32Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.