Understanding Real-World Traffic Safety through RoadSafe365 Benchmark
- URL: http://arxiv.org/abs/2602.07212v1
- Date: Fri, 06 Feb 2026 21:48:25 GMT
- Title: Understanding Real-World Traffic Safety through RoadSafe365 Benchmark
- Authors: Xinyu Liu, Darryl C. Jacob, Yuxin Liu, Xinsong Du, Muchao Ye, Bolei Zhou, Pan He,
- Abstract summary: RoadSafe365 is a large-scale vision-language benchmark for traffic safety analysis.<n>It supports fine-grained analysis of traffic safety from extensive and diverse real-world video data collections.<n>RoadSafe365 is independently curated and systematically organized using a hierarchical taxonomy.
- Score: 34.52836638662823
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Although recent traffic benchmarks have advanced multimodal data analysis, they generally lack systematic evaluation aligned with official safety standards. To fill this gap, we introduce RoadSafe365, a large-scale vision-language benchmark that supports fine-grained analysis of traffic safety from extensive and diverse real-world video data collections. Unlike prior works that focus primarily on coarse accident identification, RoadSafe365 is independently curated and systematically organized using a hierarchical taxonomy that refines and extends foundational definitions of crash, incident, and violation to bridge official traffic safety standards with data-driven traffic understanding systems. RoadSafe365 provides rich attribute annotations across diverse traffic event types, environmental contexts, and interaction scenarios, yielding 36,196 annotated clips from both dashcam and surveillance cameras. Each clip is paired with multiple-choice question-answer sets, comprising 864K candidate options, 8.4K unique answers, and 36K detailed scene descriptions collectively designed for vision-language understanding and reasoning. We establish strong baselines and observe consistent gains when fine-tuning on RoadSafe365. Cross-domain experiments on both real and synthetic datasets further validate its effectiveness. Designed for large-scale training and standardized evaluation, RoadSafe365 provides a comprehensive benchmark to advance reproducible research in real-world traffic safety analysis.
Related papers
- SafeDriveRAG: Towards Safe Autonomous Driving with Knowledge Graph-based Retrieval-Augmented Generation [27.135615596331263]
Vision-language models (VLMs) can be utilized to enhance the safety for the autonomous driving system.<n>Existing research has largely overlooked the evaluation of these models in traffic safety-critical driving scenarios.<n>We propose a new baseline based on VLM with knowledge graph-based retrieval-augmented generation for visual question answering.
arXiv Detail & Related papers (2025-07-29T08:40:17Z) - Towards Reliable and Interpretable Traffic Crash Pattern Prediction and Safety Interventions Using Customized Large Language Models [14.53510262691888]
TrafficSafe is a framework that adapts to reframe crash prediction and feature attribution as text-level reasoning.<n>Alcohol-impaired driving is the leading factor in severe crashes.<n>TrafficSafe highlights pivotal features during model training guiding strategic crash data collection improvements.
arXiv Detail & Related papers (2025-05-18T21:02:30Z) - SafeAuto: Knowledge-Enhanced Safe Autonomous Driving with Multimodal Foundation Models [63.71984266104757]
We propose SafeAuto, a framework that enhances MLLM-based autonomous driving by incorporating both unstructured and structured knowledge.<n>To explicitly integrate safety knowledge, we develop a reasoning component that translates traffic rules into first-order logic.<n>Our Multimodal Retrieval-Augmented Generation model leverages video, control signals, and environmental attributes to learn from past driving experiences.
arXiv Detail & Related papers (2025-02-28T21:53:47Z) - When language and vision meet road safety: leveraging multimodal large language models for video-based traffic accident analysis [6.213279061986497]
SeeUnsafe is a framework that transforms video-based traffic accident analysis into a more interactive, conversational approach.<n>Our framework employs a multimodal-based aggregation strategy to handle videos of various lengths and generate structured responses for review and evaluation.<n>We conduct extensive experiments on the Toyota Woven Traffic Safety dataset, demonstrating that SeeUnsafe effectively performs accident-aware video classification and visual grounding.
arXiv Detail & Related papers (2025-01-17T23:35:34Z) - Traffic and Safety Rule Compliance of Humans in Diverse Driving Situations [48.924085579865334]
Analyzing human data is crucial for developing autonomous systems that replicate safe driving practices.
This paper presents a comparative evaluation of human compliance with traffic and safety rules across multiple trajectory prediction datasets.
arXiv Detail & Related papers (2024-11-04T09:21:00Z) - Learning Traffic Crashes as Language: Datasets, Benchmarks, and What-if Causal Analyses [76.59021017301127]
We propose a large-scale traffic crash language dataset, named CrashEvent, summarizing 19,340 real-world crash reports.
We further formulate the crash event feature learning as a novel text reasoning problem and further fine-tune various large language models (LLMs) to predict detailed accident outcomes.
Our experiments results show that our LLM-based approach not only predicts the severity of accidents but also classifies different types of accidents and predicts injury outcomes.
arXiv Detail & Related papers (2024-06-16T03:10:16Z) - Leveraging Driver Field-of-View for Multimodal Ego-Trajectory Prediction [69.29802752614677]
RouteFormer is a novel ego-trajectory prediction network combining GPS data, environmental context, and the driver's field-of-view.<n>To tackle data scarcity and enhance diversity, we introduce GEM, a dataset of urban driving scenarios enriched with synchronized driver field-of-view and gaze data.
arXiv Detail & Related papers (2023-12-13T23:06:30Z) - Network-level Safety Metrics for Overall Traffic Safety Assessment: A
Case Study [7.8191100993403495]
This paper defines a new set of network-level safety metrics for the overall safety assessment of traffic flow by processing imagery taken by roadside infrastructure sensors.
An integrative analysis of the safety metrics and crash data reveals the insightful temporal and spatial correlation between the representative network-level safety metrics and the crash frequency.
arXiv Detail & Related papers (2022-01-27T19:07:08Z) - An Experimental Urban Case Study with Various Data Sources and a Model
for Traffic Estimation [65.28133251370055]
We organize an experimental campaign with video measurement in an area within the urban network of Zurich, Switzerland.
We focus on capturing the traffic state in terms of traffic flow and travel times by ensuring measurements from established thermal cameras.
We propose a simple yet efficient Multiple Linear Regression (MLR) model to estimate travel times with fusion of various data sources.
arXiv Detail & Related papers (2021-08-02T08:13:57Z) - ISSAFE: Improving Semantic Segmentation in Accidents by Fusing
Event-based Data [34.36975697486129]
We present a rarely addressed task regarding semantic segmentation in accidental scenarios, along with an accident dataset DADA-seg.
We propose a novel event-based multi-modal segmentation architecture ISSAFE.
Our approach achieves +8.2% mIoU performance gain on the proposed evaluation set, exceeding more than 10 state-of-the-art segmentation methods.
arXiv Detail & Related papers (2020-08-20T14:03:34Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.