Related papers: Learning Traffic Crashes as Language: Datasets, Benchmarks, and What-if Causal Analyses

Learning Traffic Crashes as Language: Datasets, Benchmarks, and What-if Causal Analyses

URL: http://arxiv.org/abs/2406.10789v1
Date: Sun, 16 Jun 2024 03:10:16 GMT
Title: Learning Traffic Crashes as Language: Datasets, Benchmarks, and What-if Causal Analyses
Authors: Zhiwen Fan, Pu Wang, Yang Zhao, Yibo Zhao, Boris Ivanovic, Zhangyang Wang, Marco Pavone, Hao Frank Yang,
Abstract summary: We propose a large-scale traffic crash language dataset, named CrashEvent, summarizing 19,340 real-world crash reports. We further formulate the crash event feature learning as a novel text reasoning problem and further fine-tune various large language models (LLMs) to predict detailed accident outcomes. Our experiments results show that our LLM-based approach not only predicts the severity of accidents but also classifies different types of accidents and predicts injury outcomes.
Score: 76.59021017301127
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: The increasing rate of road accidents worldwide results not only in significant loss of life but also imposes billions financial burdens on societies. Current research in traffic crash frequency modeling and analysis has predominantly approached the problem as classification tasks, focusing mainly on learning-based classification or ensemble learning methods. These approaches often overlook the intricate relationships among the complex infrastructure, environmental, human and contextual factors related to traffic crashes and risky situations. In contrast, we initially propose a large-scale traffic crash language dataset, named CrashEvent, summarizing 19,340 real-world crash reports and incorporating infrastructure data, environmental and traffic textual and visual information in Washington State. Leveraging this rich dataset, we further formulate the crash event feature learning as a novel text reasoning problem and further fine-tune various large language models (LLMs) to predict detailed accident outcomes, such as crash types, severity and number of injuries, based on contextual and environmental factors. The proposed model, CrashLLM, distinguishes itself from existing solutions by leveraging the inherent text reasoning capabilities of LLMs to parse and learn from complex, unstructured data, thereby enabling a more nuanced analysis of contributing factors. Our experiments results shows that our LLM-based approach not only predicts the severity of accidents but also classifies different types of accidents and predicts injury outcomes, all with averaged F1 score boosted from 34.9% to 53.8%. Furthermore, CrashLLM can provide valuable insights for numerous open-world what-if situational-awareness traffic safety analyses with learned reasoning features, which existing models cannot offer. We make our benchmark, datasets, and model public available for further exploration.

Related papers

E-bike agents: Large Language Model-Driven E-Bike Accident Analysis and Severity Prediction [1.370096215615823]
This study introduces E-bike agents, a framework that uses large language models (LLM) powered agents to classify and extract safety variables from unstructured incident reports.<n>Our framework consists of four LLM agents, handling data classification, information extraction, injury cause determination, and component linkage, to extract the key factors that could lead to E-bike accidents.<n>Our research shows that equipment issues are slightly more common than human-related ones, but human-related incidents are more often fatal.
arXiv Detail & Related papers (2025-06-05T05:49:41Z)
Towards Reliable and Interpretable Traffic Crash Pattern Prediction and Safety Interventions Using Customized Large Language Models [14.53510262691888]
TrafficSafe is a framework that adapts to reframe crash prediction and feature attribution as text-level reasoning.<n>Alcohol-impaired driving is the leading factor in severe crashes.<n>TrafficSafe highlights pivotal features during model training guiding strategic crash data collection improvements.
arXiv Detail & Related papers (2025-05-18T21:02:30Z)
Advanced Crash Causation Analysis for Freeway Safety: A Large Language Model Approach to Identifying Key Contributing Factors [0.0]
This research leverages large language model (LLM) to analyze freeway crash data and provide crash causation analysis accordingly.<n>The fine-tuned Llama3 8B model was then used to identify crash causation without pre-labeled data through zero-shot classification.<n>Results demonstrate that LLMs effectively identify primary crash causes such as alcohol-impaired driving, speeding, aggressive driving, and driver inattention.
arXiv Detail & Related papers (2025-05-15T04:07:55Z)
CrashSage: A Large Language Model-Centered Framework for Contextual and Interpretable Traffic Crash Analysis [0.46040036610482665]
Road crashes claim over 1.3 million lives annually worldwide and incur global economic losses exceeding $1.8 trillion.<n>This study presents CrashSage, a novel Large Language Model (LLM)-centered framework designed to advance crash analysis and modeling through four key innovations.
arXiv Detail & Related papers (2025-05-08T00:23:18Z)
Enhancing Crash Frequency Modeling Based on Augmented Multi-Type Data by Hybrid VAE-Diffusion-Based Generative Neural Networks [13.402051372401822]
A key challenge in crash frequency modelling is the prevalence of excessive zero observations. We propose a hybrid VAE-Diffusion neural network, designed to reduce zero observations. We assess the synthetic data quality generated by this model through metrics like similarity, accuracy, diversity, and structural consistency.
arXiv Detail & Related papers (2025-01-17T07:53:27Z)
Feature Group Tabular Transformer: A Novel Approach to Traffic Crash Modeling and Causality Analysis [0.40964539027092917]
This study introduces a novel approach to predicting collision types by utilizing a comprehensive dataset fused from multiple sources. Central to our approach is the development of a Feature Group Tabular Transformer (FGTT) model, which organizes disparate data into meaningful feature groups. The FGTT model is benchmarked against widely used tree ensemble models, including Random Forest, XGBoost, and CatBoost, demonstrating superior predictive performance.
arXiv Detail & Related papers (2024-12-06T20:47:13Z)
An Explainable Machine Learning Approach to Traffic Accident Fatality Prediction [0.02730969268472861]
Road traffic accidents pose a significant public health threat worldwide. This study presents a machine learning-based approach for classifying fatal and non-fatal road accident outcomes.
arXiv Detail & Related papers (2024-09-18T12:41:56Z)
Exploring Traffic Crash Narratives in Jordan Using Text Mining Analytics [4.465427147188149]
This study collected crash data from five major freeways in Jordan that cover narratives of 7,587 records from 2018-2022. An unsupervised learning method was adopted to learn the pattern from crash data. Results show that text mining analytics is a promising method and underscore the multifactorial nature of traffic crashes.
arXiv Detail & Related papers (2024-06-11T20:07:39Z)
Multi-modal Causal Structure Learning and Root Cause Analysis [67.67578590390907]
We propose Mulan, a unified multi-modal causal structure learning method for root cause localization. We leverage a log-tailored language model to facilitate log representation learning, converting log sequences into time-series data. We also introduce a novel key performance indicator-aware attention mechanism for assessing modality reliability and co-learning a final causal graph.
arXiv Detail & Related papers (2024-02-04T05:50:38Z)
AccidentGPT: Large Multi-Modal Foundation Model for Traffic Accident Analysis [3.8763079966791523]
AccidentGPT is a foundation model of traffic accident analysis. It incorporates multi-modal input data to automatically reconstruct the accident process video with dynamics details.
arXiv Detail & Related papers (2024-01-05T19:33:21Z)
Exploring Factors Affecting Pedestrian Crash Severity Using TabNet: A Deep Learning Approach [0.0]
This study presents the first investigation of pedestrian crash severity using the TabNet model. Through the application of TabNet to a comprehensive dataset from Utah covering the years 2010 to 2022, we uncover intricate factors contributing to pedestrian crash severity.
arXiv Detail & Related papers (2023-11-29T19:44:52Z)
A Study of Situational Reasoning for Traffic Understanding [63.45021731775964]
We devise three novel text-based tasks for situational reasoning in the traffic domain. We adopt four knowledge-enhanced methods that have shown generalization capability across language reasoning tasks in prior work. We provide in-depth analyses of model performance on data partitions and examine model predictions categorically.
arXiv Detail & Related papers (2023-06-05T01:01:12Z)
Predicting Seriousness of Injury in a Traffic Accident: A New Imbalanced Dataset and Benchmark [62.997667081978825]
The paper introduces a new dataset to assess the performance of machine learning algorithms in the prediction of the seriousness of injury in a traffic accident. The dataset is created by aggregating publicly available datasets from the UK Department for Transport.
arXiv Detail & Related papers (2022-05-20T21:15:26Z)
DRFLM: Distributionally Robust Federated Learning with Inter-client Noise via Local Mixup [58.894901088797376]
federated learning has emerged as a promising approach for training a global model using data from multiple organizations without leaking their raw data. We propose a general framework to solve the above two challenges simultaneously. We provide comprehensive theoretical analysis including robustness analysis, convergence analysis, and generalization ability.
arXiv Detail & Related papers (2022-04-16T08:08:29Z)
Crash Report Data Analysis for Creating Scenario-Wise, Spatio-Temporal Attention Guidance to Support Computer Vision-based Perception of Fatal Crash Risks [8.34084323253809]
This paper develops a data analytics model, named scenario-wise, Spatio-temporal attention guidance, from fatal crash report data. It estimates the relevance of detected objects to fatal crashes from their environment and context information. The paper shows how the developed attention guidance supports the design and implementation of a preliminary CV model.
arXiv Detail & Related papers (2021-09-06T19:43:37Z)
A model for traffic incident prediction using emergency braking data [77.34726150561087]
We address the fundamental problem of data scarcity in road traffic accident prediction by training our model on emergency braking events instead of accidents. We present a prototype implementing a traffic incident prediction model for Germany based on emergency braking data from Mercedes-Benz vehicles.
arXiv Detail & Related papers (2021-02-12T18:17:12Z)

This list is automatically generated from the titles and abstracts of the papers in this site.

This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.