Abductive Ego-View Accident Video Understanding for Safe Driving
  Perception
        - URL: http://arxiv.org/abs/2403.00436v1
- Date: Fri, 1 Mar 2024 10:42:52 GMT
- Title: Abductive Ego-View Accident Video Understanding for Safe Driving
  Perception
- Authors: Jianwu Fang, Lei-lei Li, Junfei Zhou, Junbin Xiao, Hongkai Yu, Chen
  Lv, Jianru Xue, and Tat-Seng Chua
- Abstract summary: We present MM-AU, a novel dataset for Multi-Modal Accident video Understanding.
 MM-AU contains 11,727 in-the-wild ego-view accident videos, each with temporally aligned text descriptions.
We present an Abductive accident Video understanding framework for Safe Driving perception (AdVersa-SD)
- Score: 75.60000661664556
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract:   We present MM-AU, a novel dataset for Multi-Modal Accident video
Understanding. MM-AU contains 11,727 in-the-wild ego-view accident videos, each
with temporally aligned text descriptions. We annotate over 2.23 million object
boxes and 58,650 pairs of video-based accident reasons, covering 58 accident
categories. MM-AU supports various accident understanding tasks, particularly
multimodal video diffusion to understand accident cause-effect chains for safe
driving. With MM-AU, we present an Abductive accident Video understanding
framework for Safe Driving perception (AdVersa-SD). AdVersa-SD performs video
diffusion via an Object-Centric Video Diffusion (OAVD) method which is driven
by an abductive CLIP model. This model involves a contrastive interaction loss
to learn the pair co-occurrence of normal, near-accident, accident frames with
the corresponding text descriptions, such as accident reasons, prevention
advice, and accident categories. OAVD enforces the causal region learning while
fixing the content of the original frame background in video generation, to
find the dominant cause-effect chain for certain accidents. Extensive
experiments verify the abductive ability of AdVersa-SD and the superiority of
OAVD against the state-of-the-art diffusion models. Additionally, we provide
careful benchmark evaluations for object detection and accident reason
answering since AdVersa-SD relies on precise object and accident reason
information.
 
      
        Related papers
        - Causal-Entity Reflected Egocentric Traffic Accident Video Synthesis [78.14763828578904]
 Egocentricly comprehending the causes and effects of car accidents is crucial for the safety of self-driving cars.<n>This work argues that precisely identifying the accident participants and capturing their related behaviors are of critical importance.<n>We propose a novel diffusion model, Causal-VidSyn, for synthesizing egocentric traffic accident videos.
 arXiv  Detail & Related papers  (2025-06-29T14:37:48Z)
- EQ-TAA: Equivariant Traffic Accident Anticipation via Diffusion-Based   Accident Video Synthesis [79.25588905883191]
 Traffic Accident Anticipation (TAA) in traffic scenes is a challenging problem for achieving zero fatalities in the future.<n>We propose an Attentive Video Diffusion (AVD) model that synthesizes additional accident video clips.
 arXiv  Detail & Related papers  (2025-03-16T01:56:38Z)
- AVD2: Accident Video Diffusion for Accident Video Description [11.221276595088215]
 We introduce AVD2 (Accident Video Diffusion for Accident Video Description), a novel framework that enhances accident scene understanding.
The framework generates accident videos that align with detailed natural language descriptions and reasoning, resulting in the EMM-AU dataset.
 Empirical results reveal that the integration of the EMM-AU dataset establishes state-of-the-art performance across both automated metrics and human evaluations.
 arXiv  Detail & Related papers  (2025-02-20T18:22:44Z)
- DAVE: Diverse Atomic Visual Elements Dataset with High Representation of   Vulnerable Road Users in Complex and Unpredictable Environments [60.69159598130235]
 We present a new dataset, DAVE, designed for evaluating perception methods with high representation of Vulnerable Road Users (VRUs)
DAVE is a manually annotated dataset encompassing 16 diverse actor categories (spanning animals, humans, vehicles, etc.) and 16 action types (complex and rare cases like cut-ins, zigzag movement, U-turn, etc.)
Our experiments show that existing methods suffer degradation in performance when evaluated on DAVE, highlighting its benefit for future video recognition research.
 arXiv  Detail & Related papers  (2024-12-28T06:13:44Z)
- Enhancing Vision-Language Models with Scene Graphs for Traffic Accident   Understanding [45.7444555195196]
 This work introduces a multi-stage, multimodal pipeline to pre-process videos of traffic accidents, encode them as scene graphs, and align this representation with vision and language modalities for accident classification.
When trained on 4 classes, our method achieves a balanced accuracy score of 57.77% on an (unbalanced) subset of the popular Detection of Traffic Anomaly benchmark.
 arXiv  Detail & Related papers  (2024-07-08T13:15:11Z)
- Learning Traffic Crashes as Language: Datasets, Benchmarks, and What-if   Causal Analyses [76.59021017301127]
 We propose a large-scale traffic crash language dataset, named CrashEvent, summarizing 19,340 real-world crash reports.
We further formulate the crash event feature learning as a novel text reasoning problem and further fine-tune various large language models (LLMs) to predict detailed accident outcomes.
Our experiments results show that our LLM-based approach not only predicts the severity of accidents but also classifies different types of accidents and predicts injury outcomes.
 arXiv  Detail & Related papers  (2024-06-16T03:10:16Z)
- ACAV: A Framework for Automatic Causality Analysis in Autonomous Vehicle
  Accident Recordings [5.578446693797519]
 Recent fatalities have emphasized the importance of safety validation through large-scale testing.
We propose ACAV, an automated framework designed to conduct causality analysis for AV accident recordings.
We evaluate ACAV on the Apollo ADS, finding that it can identify five distinct types of causal events in 93.64% of 110 accident recordings.
 arXiv  Detail & Related papers  (2024-01-13T12:41:05Z)
- A Memory-Augmented Multi-Task Collaborative Framework for Unsupervised
  Traffic Accident Detection in Driving Videos [22.553356096143734]
 We propose a novel memory-augmented multi-task collaborative framework (MAMTCF) for unsupervised traffic accident detection in driving videos.
Our method can more accurately detect both ego-involved and non-ego accidents by simultaneously modeling appearance changes and object motions in video frames.
 arXiv  Detail & Related papers  (2023-07-27T01:45:13Z)
- DeepAccident: A Motion and Accident Prediction Benchmark for V2X
  Autonomous Driving [76.29141888408265]
 We propose a large-scale dataset containing diverse accident scenarios that frequently occur in real-world driving.
The proposed DeepAccident dataset includes 57K annotated frames and 285K annotated samples, approximately 7 times more than the large-scale nuScenes dataset.
 arXiv  Detail & Related papers  (2023-04-03T17:37:00Z)
- Augmenting Ego-Vehicle for Traffic Near-Miss and Accident Classification
  Dataset using Manipulating Conditional Style Translation [0.3441021278275805]
 There is no difference between accident and near-miss at the time before the accident happened.
Our contribution is to redefine the accident definition and re-annotate the accident inconsistency on DADA-2000 dataset together with near-miss.
The proposed method integrates two different components: conditional style translation (CST) and separable 3-dimensional convolutional neural network (S3D)
 arXiv  Detail & Related papers  (2023-01-06T22:04:47Z)
- Cognitive Accident Prediction in Driving Scenes: A Multimodality
  Benchmark [77.54411007883962]
 We propose a Cognitive Accident Prediction (CAP) method that explicitly leverages human-inspired cognition of text description on the visual observation and the driver attention to facilitate model training.
CAP is formulated by an attentive text-to-vision shift fusion module, an attentive scene context transfer module, and the driver attention guided accident prediction module.
We construct a new large-scale benchmark consisting of 11,727 in-the-wild accident videos with over 2.19 million frames.
 arXiv  Detail & Related papers  (2022-12-19T11:43:02Z)
- An Attention-guided Multistream Feature Fusion Network for Localization
  of Risky Objects in Driving Videos [10.674638266121574]
 This paper proposes an attention-guided multistream feature fusion network (AM-Net) to localize dangerous traffic agents from dashcam videos.
Two Gated Recurrent Unit (GRU) networks use object bounding box and optical flow features extracted from consecutive video frames to capturetemporal cues for distinguishing dangerous traffic agents.
Fusing the two streams of features, AM-Net predicts the riskiness scores of traffic agents in the video.
 arXiv  Detail & Related papers  (2022-09-16T13:36:28Z)
- Driver Intention Anticipation Based on In-Cabin and Driving Scene
  Monitoring [52.557003792696484]
 We present a framework for the detection of the drivers' intention based on both in-cabin and traffic scene videos.
Our framework achieves a prediction with the accuracy of 83.98% and F1-score of 84.3%.
 arXiv  Detail & Related papers  (2020-06-20T11:56:32Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
       
     
           This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.