Related papers: Surveillance Video-Based Traffic Accident Detection Using Transformer Architecture

Surveillance Video-Based Traffic Accident Detection Using Transformer Architecture

URL: http://arxiv.org/abs/2512.11350v1
Date: Fri, 12 Dec 2025 07:57:36 GMT
Title: Surveillance Video-Based Traffic Accident Detection Using Transformer Architecture
Authors: Tanu Singh, Pranamesh Chakraborty, Long T. Truong,
Abstract summary: Traffic accidents represent a leading cause of mortality globally, with incidence rates due to increasing population, urbanization and motorization.<n>Traditional computer methods for accident detection struggle with limited understanding and poor cross-domain generalization.<n>We propose an accident detection model based on a transformer architecture using pre-extracted spatial video features.
Score: 2.621034368312571
License: http://creativecommons.org/licenses/by/4.0/
Abstract: Road traffic accidents represent a leading cause of mortality globally, with incidence rates rising due to increasing population, urbanization, and motorization. Rising accident rates raise concerns about traffic surveillance effectiveness. Traditional computer vision methods for accident detection struggle with limited spatiotemporal understanding and poor cross-domain generalization. Recent advances in transformer architectures excel at modeling global spatial-temporal dependencies and parallel computation. However, applying these models to automated traffic accident detection is limited by small, non-diverse datasets, hindering the development of robust, generalizable systems. To address this gap, we curated a comprehensive and balanced dataset that captures a wide spectrum of traffic environments, accident types, and contextual variations. Utilizing the curated dataset, we propose an accident detection model based on a transformer architecture using pre-extracted spatial video features. The architecture employs convolutional layers to extract local correlations across diverse patterns within a frame, while leveraging transformers to capture sequential-temporal dependencies among the retrieved features. Moreover, most existing studies neglect the integration of motion cues, which are essential for understanding dynamic scenes, especially during accidents. These approaches typically rely on static features or coarse temporal information. In this study, multiple methods for incorporating motion cues were evaluated to identify the most effective strategy. Among the tested input approaches, concatenating RGB features with optical flow achieved the highest accuracy at 88.3%. The results were further compared with vision language models (VLM) such as GPT, Gemini, and LLaVA-NeXT-Video to assess the effectiveness of the proposed method.

Related papers

Optimization-Guided Diffusion for Interactive Scene Generation [52.23368750264419]
We present OMEGA, an optimization-guided, training-free framework that enforces structural consistency and interaction awareness during diffusion-based sampling.<n>We show that OMEGA improves generation realism, consistency, and controllability, increasing the ratio of physically and behaviorally valid scenes.<n>Our approach can also generate $5times$ more near-collision frames with a time-to-collision under three seconds.
arXiv Detail & Related papers (2025-12-08T15:56:18Z)
Graph Enhanced Trajectory Anomaly Detection [23.8160784400789]
Trajectory anomaly detection is essential for identifying unusual and unexpected movement patterns in applications ranging from intelligent transportation systems to urban safety and fraud prevention.<n>Existing methods only consider limited aspects of the trajectory nature and its movement space by treating trajectories as sequences of sampled locations.<n>The proposed Graph Enhanced Trajectory Anomaly Detection framework tightly integrates road network topology, segment semantics, and historical travel patterns to model trajectory data.
arXiv Detail & Related papers (2025-09-22T20:15:15Z)
Urban Traffic Accident Risk Prediction Revisited: Regionality, Proximity, Similarity and Sparsity [18.566139471849844]
Traffic accidents pose a significant risk to human health and property safety. To prevent traffic accidents, predicting their risks has garnered growing interest. We argue that a desired prediction solution should demonstrate resilience to the complexity of traffic accidents.
arXiv Detail & Related papers (2024-07-29T03:10:15Z)
CRASH: Crash Recognition and Anticipation System Harnessing with Context-Aware and Temporal Focus Attentions [13.981748780317329]
Accurately and promptly predicting accidents among surrounding traffic agents from camera footage is crucial for the safety of autonomous vehicles (AVs) This study introduces a novel accident anticipation framework for AVs, termed CRASH. It seamlessly integrates five components: object detector, feature extractor, object-aware module, context-aware module, and multi-layer fusion. Our model surpasses existing top baselines in critical evaluation metrics like Average Precision (AP) and mean Time-To-Accident (mTTA)
arXiv Detail & Related papers (2024-07-25T04:12:49Z)
Layout Sequence Prediction From Noisy Mobile Modality [53.49649231056857]
Trajectory prediction plays a vital role in understanding pedestrian movement for applications such as autonomous driving and robotics. Current trajectory prediction models depend on long, complete, and accurately observed sequences from visual modalities. We propose LTrajDiff, a novel approach that treats objects obstructed or out of sight as equally important as those with fully visible trajectories.
arXiv Detail & Related papers (2023-10-09T20:32:49Z)
Unsupervised Domain Adaptation for Self-Driving from Past Traversal Features [69.47588461101925]
We propose a method to adapt 3D object detectors to new driving environments. Our approach enhances LiDAR-based detection models using spatial quantized historical features. Experiments on real-world datasets demonstrate significant improvements.
arXiv Detail & Related papers (2023-09-21T15:00:31Z)
Correlating sparse sensing for large-scale traffic speed estimation: A Laplacian-enhanced low-rank tensor kriging approach [76.45949280328838]
We propose a Laplacian enhanced low-rank tensor (LETC) framework featuring both lowrankness and multi-temporal correlations for large-scale traffic speed kriging. We then design an efficient solution algorithm via several effective numeric techniques to scale up the proposed model to network-wide kriging.
arXiv Detail & Related papers (2022-10-21T07:25:57Z)
Real-Time Accident Detection in Traffic Surveillance Using Deep Learning [0.8808993671472349]
This paper presents a new efficient framework for accident detection at intersections for traffic surveillance applications. The proposed framework consists of three hierarchical steps, including efficient and accurate object detection based on the state-of-the-art YOLOv4 method. The robustness of the proposed framework is evaluated using video sequences collected from YouTube with diverse illumination conditions.
arXiv Detail & Related papers (2022-08-12T19:07:20Z)
Spatial-Temporal Conv-sequence Learning with Accident Encoding for Traffic Flow Prediction [17.94199362114272]
In intelligent transportation system, the key problem of traffic forecasting is how to extract the periodic temporal dependencies and complex spatial correlation. We propose the Spatial-Temporal Conv-sequence Learning (STCL), in which a focused temporal block uses unidirectional convolution to effectively capture short-term periodic temporal dependence. We conduct extensive experiments on large-scale real-world tasks and verify the effectiveness of our proposed method.
arXiv Detail & Related papers (2021-05-21T17:43:07Z)
Multi-intersection Traffic Optimisation: A Benchmark Dataset and a Strong Baseline [85.9210953301628]
Control of traffic signals is fundamental and critical to alleviate traffic congestion in urban areas. Because of the high complexity of modelling the problem, experimental settings of current works are often inconsistent. We propose a novel and strong baseline model based on deep reinforcement learning with the encoder-decoder structure.
arXiv Detail & Related papers (2021-01-24T03:55:39Z)
Risk-Averse MPC via Visual-Inertial Input and Recurrent Networks for Online Collision Avoidance [95.86944752753564]
We propose an online path planning architecture that extends the model predictive control (MPC) formulation to consider future location uncertainties. Our algorithm combines an object detection pipeline with a recurrent neural network (RNN) which infers the covariance of state estimates. The robustness of our methods is validated on complex quadruped robot dynamics and can be generally applied to most robotic platforms.
arXiv Detail & Related papers (2020-07-28T07:34:30Z)

This list is automatically generated from the titles and abstracts of the papers in this site.