Causal-Entity Reflected Egocentric Traffic Accident Video Synthesis
- URL: http://arxiv.org/abs/2506.23263v1
- Date: Sun, 29 Jun 2025 14:37:48 GMT
- Title: Causal-Entity Reflected Egocentric Traffic Accident Video Synthesis
- Authors: Lei-lei Li, Jianwu Fang, Junbin Xiao, Shanmin Pang, Hongkai Yu, Chen Lv, Jianru Xue, Tat-Seng Chua,
- Abstract summary: Egocentricly comprehending the causes and effects of car accidents is crucial for the safety of self-driving cars.<n>This work argues that precisely identifying the accident participants and capturing their related behaviors are of critical importance.<n>We propose a novel diffusion model, Causal-VidSyn, for synthesizing egocentric traffic accident videos.
- Score: 78.14763828578904
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Egocentricly comprehending the causes and effects of car accidents is crucial for the safety of self-driving cars, and synthesizing causal-entity reflected accident videos can facilitate the capability test to respond to unaffordable accidents in reality. However, incorporating causal relations as seen in real-world videos into synthetic videos remains challenging. This work argues that precisely identifying the accident participants and capturing their related behaviors are of critical importance. In this regard, we propose a novel diffusion model, Causal-VidSyn, for synthesizing egocentric traffic accident videos. To enable causal entity grounding in video diffusion, Causal-VidSyn leverages the cause descriptions and driver fixations to identify the accident participants and behaviors, facilitated by accident reason answering and gaze-conditioned selection modules. To support Causal-VidSyn, we further construct Drive-Gaze, the largest driver gaze dataset (with 1.54M frames of fixations) in driving accident scenarios. Extensive experiments show that Causal-VidSyn surpasses state-of-the-art video diffusion models in terms of frame quality and causal sensitivity in various tasks, including accident video editing, normal-to-accident video diffusion, and text-to-video generation.
Related papers
- Ctrl-Crash: Controllable Diffusion for Realistic Car Crashes [26.71659319735027]
Ctrl-Crash is a controllable car crash video generation model that conditions on signals such as bounding boxes, crash types, and an initial image frame.<n>Our approach enables counterfactual scenario generation where minor variations in input can lead to dramatically different crash outcomes.
arXiv Detail & Related papers (2025-05-30T21:04:38Z) - EQ-TAA: Equivariant Traffic Accident Anticipation via Diffusion-Based Accident Video Synthesis [79.25588905883191]
Traffic Accident Anticipation (TAA) in traffic scenes is a challenging problem for achieving zero fatalities in the future.<n>We propose an Attentive Video Diffusion (AVD) model that synthesizes additional accident video clips.
arXiv Detail & Related papers (2025-03-16T01:56:38Z) - AVD2: Accident Video Diffusion for Accident Video Description [11.221276595088215]
We introduce AVD2 (Accident Video Diffusion for Accident Video Description), a novel framework that enhances accident scene understanding.<n>The framework generates accident videos that align with detailed natural language descriptions and reasoning, resulting in the EMM-AU dataset.<n> Empirical results reveal that the integration of the EMM-AU dataset establishes state-of-the-art performance across both automated metrics and human evaluations.
arXiv Detail & Related papers (2025-02-20T18:22:44Z) - Finding the Trigger: Causal Abductive Reasoning on Video Events [59.188208873301015]
Causal Abductive Reasoning on Video Events (CARVE) involves identifying causal relationships between events in a video.<n>We present a Causal Event Relation Network (CERN) that examines the relationships between video events in temporal and semantic spaces.
arXiv Detail & Related papers (2025-01-16T05:39:28Z) - Abductive Ego-View Accident Video Understanding for Safe Driving
Perception [75.60000661664556]
We present MM-AU, a novel dataset for Multi-Modal Accident video Understanding.
MM-AU contains 11,727 in-the-wild ego-view accident videos, each with temporally aligned text descriptions.
We present an Abductive accident Video understanding framework for Safe Driving perception (AdVersa-SD)
arXiv Detail & Related papers (2024-03-01T10:42:52Z) - Causalainer: Causal Explainer for Automatic Video Summarization [77.36225634727221]
In many application scenarios, improper video summarization can have a large impact.
Modeling explainability is a key concern.
A Causal Explainer, dubbed Causalainer, is proposed to address this issue.
arXiv Detail & Related papers (2023-04-30T11:42:06Z) - Cognitive Accident Prediction in Driving Scenes: A Multimodality
Benchmark [77.54411007883962]
We propose a Cognitive Accident Prediction (CAP) method that explicitly leverages human-inspired cognition of text description on the visual observation and the driver attention to facilitate model training.
CAP is formulated by an attentive text-to-vision shift fusion module, an attentive scene context transfer module, and the driver attention guided accident prediction module.
We construct a new large-scale benchmark consisting of 11,727 in-the-wild accident videos with over 2.19 million frames.
arXiv Detail & Related papers (2022-12-19T11:43:02Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.