Predicting and Explaining Traffic Crash Severity Through Crash Feature Selection
- URL: http://arxiv.org/abs/2508.11504v1
- Date: Fri, 15 Aug 2025 14:31:26 GMT
- Title: Predicting and Explaining Traffic Crash Severity Through Crash Feature Selection
- Authors: Andrea Castellani, Zacharias Papadovasilakis, Giorgos Papoutsoglou, Mary Cole, Brian Bautsch, Tobias Rodemann, Ioannis Tsamardinos, Angela Harden,
- Abstract summary: This study introduces a curated dataset of more than 3 million people involved in accidents in Ohio over six years-2022.<n>The primary contribution is a transparent and reproducible methodology that combines Automated Machine Learning (AutoML) and explainable artificial intelligence (AI) to identify and interpret key risk factors associated with severe crashes.<n>Key features spanned demographic, environmental, vehicle, human and operational categories, including location type posted speed, minimum occupant age, and pre-crash action.
- Score: 1.0941365324532635
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Motor vehicle crashes remain a leading cause of injury and death worldwide, necessitating data-driven approaches to understand and mitigate crash severity. This study introduces a curated dataset of more than 3 million people involved in accidents in Ohio over six years (2017-2022), aggregated to more than 2.3 million vehicle-level records for predictive analysis. The primary contribution is a transparent and reproducible methodology that combines Automated Machine Learning (AutoML) and explainable artificial intelligence (AI) to identify and interpret key risk factors associated with severe crashes. Using the JADBio AutoML platform, predictive models were constructed to distinguish between severe and non-severe crash outcomes. The models underwent rigorous feature selection across stratified training subsets, and their outputs were interpreted using SHapley Additive exPlanations (SHAP) to quantify the contribution of individual features. A final Ridge Logistic Regression model achieved an AUC-ROC of 85.6% on the training set and 84.9% on a hold-out test set, with 17 features consistently identified as the most influential predictors. Key features spanned demographic, environmental, vehicle, human, and operational categories, including location type, posted speed, minimum occupant age, and pre-crash action. Notably, certain traditionally emphasized factors, such as alcohol or drug impairment, were less influential in the final model compared to environmental and contextual variables. Emphasizing methodological rigor and interpretability over mere predictive performance, this study offers a scalable framework to support Vision Zero with aligned interventions and advanced data-informed traffic safety policy.
Related papers
- Real-time Secondary Crash Likelihood Prediction Excluding Post Primary Crash Features [6.477496237661746]
We propose a hybrid crash likelihood prediction framework that does not depend on postcrash features.<n>A dynamic post-temporal window is designed to extract real-time traffic flow and environmental features from primary crash locations and their upstream segments.<n>Experiments on Florida freeways demonstrate that proposed the hybrid framework correctly identifies 91% of secondary crashes with a low false alarm rate of 0.20.
arXiv Detail & Related papers (2026-02-17T22:49:33Z) - From Narratives to Probabilistic Reasoning: Predicting and Interpreting Drivers' Hazardous Actions in Crashes Using Large Language Model [3.3457493284891338]
Two-vehicle crashes account for approximately 70% of roadway crashes.<n>Driver Hazardous Action (DHA) data is limited by inconsistent and labor-intensive manual coding practices.<n>Here, we present an innovative framework that leverages a fine-tuned large language model to automatically infer DHAs from textual crash narratives.
arXiv Detail & Related papers (2025-10-14T21:35:47Z) - Overtake Detection in Trucks Using CAN Bus Signals: A Comparative Study of Machine Learning Methods [51.28632782308621]
We focus on overtake detection using Controller Area Network (CAN) bus data collected from five in-service trucks provided by the Volvo Group.<n>We evaluate three common classifiers for vehicle manoeuvre detection, Artificial Neural Networks (ANN), Random Forest (RF), and Support Vector Machines (SVM)<n>Our pertruck analysis also reveals that classification accuracy, especially for overtakes, depends on the amount of training data per vehicle.
arXiv Detail & Related papers (2025-07-01T09:20:41Z) - Advanced Crash Causation Analysis for Freeway Safety: A Large Language Model Approach to Identifying Key Contributing Factors [0.0]
This research leverages large language model (LLM) to analyze freeway crash data and provide crash causation analysis accordingly.<n>The fine-tuned Llama3 8B model was then used to identify crash causation without pre-labeled data through zero-shot classification.<n>Results demonstrate that LLMs effectively identify primary crash causes such as alcohol-impaired driving, speeding, aggressive driving, and driver inattention.
arXiv Detail & Related papers (2025-05-15T04:07:55Z) - NsBM-GAT: A Non-stationary Block Maximum and Graph Attention Framework for General Traffic Crash Risk Prediction [11.444259609536164]
Existing crash risk prediction models rely on hypothetical scenarios deemed dangerous by researchers.<n>Dashcam videos capture the pre-crash behavior of individual vehicles, but they often lack critical information about the movements of surrounding vehicles.<n>We propose a novel non-stationary extreme value theory (EVT) to capture the interactive behavior between a vehicle and its surrounding vehicles.
arXiv Detail & Related papers (2025-03-06T02:12:40Z) - Traffic and Safety Rule Compliance of Humans in Diverse Driving Situations [48.924085579865334]
Analyzing human data is crucial for developing autonomous systems that replicate safe driving practices.
This paper presents a comparative evaluation of human compliance with traffic and safety rules across multiple trajectory prediction datasets.
arXiv Detail & Related papers (2024-11-04T09:21:00Z) - Machine Learning for ALSFRS-R Score Prediction: Making Sense of the Sensor Data [44.99833362998488]
Amyotrophic Lateral Sclerosis (ALS) is a rapidly progressive neurodegenerative disease that presents individuals with limited treatment options.
The present investigation, spearheaded by the iDPP@CLEF 2024 challenge, focuses on utilizing sensor-derived data obtained through an app.
arXiv Detail & Related papers (2024-07-10T19:17:23Z) - Learning Traffic Crashes as Language: Datasets, Benchmarks, and What-if Causal Analyses [76.59021017301127]
We propose a large-scale traffic crash language dataset, named CrashEvent, summarizing 19,340 real-world crash reports.
We further formulate the crash event feature learning as a novel text reasoning problem and further fine-tune various large language models (LLMs) to predict detailed accident outcomes.
Our experiments results show that our LLM-based approach not only predicts the severity of accidents but also classifies different types of accidents and predicts injury outcomes.
arXiv Detail & Related papers (2024-06-16T03:10:16Z) - Exploring the Determinants of Pedestrian Crash Severity Using an AutoML Approach [0.0]
The research employs AutoML to assess the effects of various explanatory variables on crash outcomes.
The study incorporates SHAP (SHapley Additive exPlanations) to interpret the contributions of individual features in the predictive model.
arXiv Detail & Related papers (2024-06-07T22:02:36Z) - DRUformer: Enhancing the driving scene Important object detection with
driving relationship self-understanding [50.81809690183755]
Traffic accidents frequently lead to fatal injuries, contributing to over 50 million deaths until 2023.
Previous research primarily assessed the importance of individual participants, treating them as independent entities.
We introduce Driving scene Relationship self-Understanding transformer (DRUformer) to enhance the important object detection task.
arXiv Detail & Related papers (2023-11-11T07:26:47Z) - Enhancing Prediction and Analysis of UK Road Traffic Accident Severity
Using AI: Integration of Machine Learning, Econometric Techniques, and Time
Series Forecasting in Public Health Research [0.0]
This research investigates road traffic accident severity in the UK, using a combination of machine learning, econometric, and statistical methods.
Our approach outperforms naive forecasting with an MASE of 0.800 and ME of -73.80.
arXiv Detail & Related papers (2023-09-23T21:46:43Z) - Causal Analysis and Classification of Traffic Crash Injury Severity
Using Machine Learning Algorithms [0.0]
The data used in this study were obtained for traffic crashes on all interstates across the state of Texas from a period of six years between 2014 and 2019.
The output of the proposed severity classification approach includes three classes for fatal and severe injury (KA) crashes, non-severe and possible injury (BC) crashes, and property damage only (PDO) crashes.
The results of Granger causality analysis identified the speed limit, surface and weather conditions, traffic volume, presence of workzones, workers in workzones, and high occupancy vehicle (HOV) lanes, as the most important factors affecting crash severity
arXiv Detail & Related papers (2021-11-30T20:32:31Z) - A model for traffic incident prediction using emergency braking data [77.34726150561087]
We address the fundamental problem of data scarcity in road traffic accident prediction by training our model on emergency braking events instead of accidents.
We present a prototype implementing a traffic incident prediction model for Germany based on emergency braking data from Mercedes-Benz vehicles.
arXiv Detail & Related papers (2021-02-12T18:17:12Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.