Related papers: Abductive Ego-View Accident Video Understanding for Safe Driving Perception

Abductive Ego-View Accident Video Understanding for Safe Driving Perception

URL: http://arxiv.org/abs/2403.00436v1
Date: Fri, 1 Mar 2024 10:42:52 GMT
Title: Abductive Ego-View Accident Video Understanding for Safe Driving Perception
Authors: Jianwu Fang, Lei-lei Li, Junfei Zhou, Junbin Xiao, Hongkai Yu, Chen Lv, Jianru Xue, and Tat-Seng Chua
Abstract summary: We present MM-AU, a novel dataset for Multi-Modal Accident video Understanding. MM-AU contains 11,727 in-the-wild ego-view accident videos, each with temporally aligned text descriptions. We present an Abductive accident Video understanding framework for Safe Driving perception (AdVersa-SD)
Score: 75.60000661664556
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: We present MM-AU, a novel dataset for Multi-Modal Accident video Understanding. MM-AU contains 11,727 in-the-wild ego-view accident videos, each with temporally aligned text descriptions. We annotate over 2.23 million object boxes and 58,650 pairs of video-based accident reasons, covering 58 accident categories. MM-AU supports various accident understanding tasks, particularly multimodal video diffusion to understand accident cause-effect chains for safe driving. With MM-AU, we present an Abductive accident Video understanding framework for Safe Driving perception (AdVersa-SD). AdVersa-SD performs video diffusion via an Object-Centric Video Diffusion (OAVD) method which is driven by an abductive CLIP model. This model involves a contrastive interaction loss to learn the pair co-occurrence of normal, near-accident, accident frames with the corresponding text descriptions, such as accident reasons, prevention advice, and accident categories. OAVD enforces the causal region learning while fixing the content of the original frame background in video generation, to find the dominant cause-effect chain for certain accidents. Extensive experiments verify the abductive ability of AdVersa-SD and the superiority of OAVD against the state-of-the-art diffusion models. Additionally, we provide careful benchmark evaluations for object detection and accident reason answering since AdVersa-SD relies on precise object and accident reason information.

Related papers

Causal-Entity Reflected Egocentric Traffic Accident Video Synthesis [78.14763828578904]
Egocentricly comprehending the causes and effects of car accidents is crucial for the safety of self-driving cars.<n>This work argues that precisely identifying the accident participants and capturing their related behaviors are of critical importance.<n>We propose a novel diffusion model, Causal-VidSyn, for synthesizing egocentric traffic accident videos.
arXiv Detail & Related papers (2025-06-29T14:37:48Z)
EQ-TAA: Equivariant Traffic Accident Anticipation via Diffusion-Based Accident Video Synthesis [79.25588905883191]
Traffic Accident Anticipation (TAA) in traffic scenes is a challenging problem for achieving zero fatalities in the future.<n>We propose an Attentive Video Diffusion (AVD) model that synthesizes additional accident video clips.
arXiv Detail & Related papers (2025-03-16T01:56:38Z)
AVD2: Accident Video Diffusion for Accident Video Description [11.221276595088215]
We introduce AVD2 (Accident Video Diffusion for Accident Video Description), a novel framework that enhances accident scene understanding. The framework generates accident videos that align with detailed natural language descriptions and reasoning, resulting in the EMM-AU dataset. Empirical results reveal that the integration of the EMM-AU dataset establishes state-of-the-art performance across both automated metrics and human evaluations.
arXiv Detail & Related papers (2025-02-20T18:22:44Z)
DAVE: Diverse Atomic Visual Elements Dataset with High Representation of Vulnerable Road Users in Complex and Unpredictable Environments [60.69159598130235]
We present a new dataset, DAVE, designed for evaluating perception methods with high representation of Vulnerable Road Users (VRUs) DAVE is a manually annotated dataset encompassing 16 diverse actor categories (spanning animals, humans, vehicles, etc.) and 16 action types (complex and rare cases like cut-ins, zigzag movement, U-turn, etc.) Our experiments show that existing methods suffer degradation in performance when evaluated on DAVE, highlighting its benefit for future video recognition research.
arXiv Detail & Related papers (2024-12-28T06:13:44Z)
Enhancing Vision-Language Models with Scene Graphs for Traffic Accident Understanding [45.7444555195196]
This work introduces a multi-stage, multimodal pipeline to pre-process videos of traffic accidents, encode them as scene graphs, and align this representation with vision and language modalities for accident classification. When trained on 4 classes, our method achieves a balanced accuracy score of 57.77% on an (unbalanced) subset of the popular Detection of Traffic Anomaly benchmark.
arXiv Detail & Related papers (2024-07-08T13:15:11Z)
Learning Traffic Crashes as Language: Datasets, Benchmarks, and What-if Causal Analyses [76.59021017301127]
We propose a large-scale traffic crash language dataset, named CrashEvent, summarizing 19,340 real-world crash reports. We further formulate the crash event feature learning as a novel text reasoning problem and further fine-tune various large language models (LLMs) to predict detailed accident outcomes. Our experiments results show that our LLM-based approach not only predicts the severity of accidents but also classifies different types of accidents and predicts injury outcomes.
arXiv Detail & Related papers (2024-06-16T03:10:16Z)
ACAV: A Framework for Automatic Causality Analysis in Autonomous Vehicle Accident Recordings [5.578446693797519]
Recent fatalities have emphasized the importance of safety validation through large-scale testing. We propose ACAV, an automated framework designed to conduct causality analysis for AV accident recordings. We evaluate ACAV on the Apollo ADS, finding that it can identify five distinct types of causal events in 93.64% of 110 accident recordings.
arXiv Detail & Related papers (2024-01-13T12:41:05Z)
A Memory-Augmented Multi-Task Collaborative Framework for Unsupervised Traffic Accident Detection in Driving Videos [22.553356096143734]
We propose a novel memory-augmented multi-task collaborative framework (MAMTCF) for unsupervised traffic accident detection in driving videos. Our method can more accurately detect both ego-involved and non-ego accidents by simultaneously modeling appearance changes and object motions in video frames.
arXiv Detail & Related papers (2023-07-27T01:45:13Z)
DeepAccident: A Motion and Accident Prediction Benchmark for V2X Autonomous Driving [76.29141888408265]
We propose a large-scale dataset containing diverse accident scenarios that frequently occur in real-world driving. The proposed DeepAccident dataset includes 57K annotated frames and 285K annotated samples, approximately 7 times more than the large-scale nuScenes dataset.
arXiv Detail & Related papers (2023-04-03T17:37:00Z)
Augmenting Ego-Vehicle for Traffic Near-Miss and Accident Classification Dataset using Manipulating Conditional Style Translation [0.3441021278275805]
There is no difference between accident and near-miss at the time before the accident happened. Our contribution is to redefine the accident definition and re-annotate the accident inconsistency on DADA-2000 dataset together with near-miss. The proposed method integrates two different components: conditional style translation (CST) and separable 3-dimensional convolutional neural network (S3D)
arXiv Detail & Related papers (2023-01-06T22:04:47Z)
Cognitive Accident Prediction in Driving Scenes: A Multimodality Benchmark [77.54411007883962]
We propose a Cognitive Accident Prediction (CAP) method that explicitly leverages human-inspired cognition of text description on the visual observation and the driver attention to facilitate model training. CAP is formulated by an attentive text-to-vision shift fusion module, an attentive scene context transfer module, and the driver attention guided accident prediction module. We construct a new large-scale benchmark consisting of 11,727 in-the-wild accident videos with over 2.19 million frames.
arXiv Detail & Related papers (2022-12-19T11:43:02Z)
An Attention-guided Multistream Feature Fusion Network for Localization of Risky Objects in Driving Videos [10.674638266121574]
This paper proposes an attention-guided multistream feature fusion network (AM-Net) to localize dangerous traffic agents from dashcam videos. Two Gated Recurrent Unit (GRU) networks use object bounding box and optical flow features extracted from consecutive video frames to capturetemporal cues for distinguishing dangerous traffic agents. Fusing the two streams of features, AM-Net predicts the riskiness scores of traffic agents in the video.
arXiv Detail & Related papers (2022-09-16T13:36:28Z)
Driver Intention Anticipation Based on In-Cabin and Driving Scene Monitoring [52.557003792696484]
We present a framework for the detection of the drivers' intention based on both in-cabin and traffic scene videos. Our framework achieves a prediction with the accuracy of 83.98% and F1-score of 84.3%.
arXiv Detail & Related papers (2020-06-20T11:56:32Z)

This list is automatically generated from the titles and abstracts of the papers in this site.