Crash Severity Risk Modeling Strategies under Data Imbalance
- URL: http://arxiv.org/abs/2412.02094v1
- Date: Tue, 03 Dec 2024 02:28:35 GMT
- Title: Crash Severity Risk Modeling Strategies under Data Imbalance
- Authors: Abdullah Al Mamun, Abyad Enan, Debbie A. Indah, Judith Mwakalonge, Gurcan Comert, Mashrur Chowdhury,
- Abstract summary: This study investigates crash severity risk modeling strategies for work zones involving large vehicles when there are crash data imbalance between low-severity (LS) and high-severity (HS) crashes.
We utilized crash data, involving large vehicles in South Carolina work zones for the period between 2014 and 2018, which included 4 times more LS crashes compared to HS crashes.
The findings of this study highlight a disparity between LS and HS predictions, with less-accurate prediction of HS crashes compared to LS crashes due to class imbalance and feature overlaps between LS and HS crashes.
- Score: 7.9613232032536745
- License:
- Abstract: This study investigates crash severity risk modeling strategies for work zones involving large vehicles (i.e., trucks, buses, and vans) when there are crash data imbalance between low-severity (LS) and high-severity (HS) crashes. We utilized crash data, involving large vehicles in South Carolina work zones for the period between 2014 and 2018, which included 4 times more LS crashes compared to HS crashes. The objective of this study is to explore crash severity prediction performance of various models under different feature selection and data balancing techniques. The findings of this study highlight a disparity between LS and HS predictions, with less-accurate prediction of HS crashes compared to LS crashes due to class imbalance and feature overlaps between LS and HS crashes. Combining features from multiple feature selection techniques: statistical correlation, feature importance, recursive elimination, statistical tests, and mutual information, slightly improves HS crash prediction performance. Data balancing techniques such as NearMiss-1 and RandomUnderSampler, maximize HS recall when paired with certain prediction models, such as Bayesian Mixed Logit (BML), NeuralNet, and RandomForest, making them suitable for HS crash prediction. Conversely, RandomOverSampler, HS Class Weighting, and Kernel-based Synthetic Minority Oversampling (K-SMOTE), used with certain prediction models such as BML, CatBoost, and LightGBM, achieve a balanced performance, defined as achieving an equitable trade-off between LS and HS prediction performance metrics. These insights provide safety analysts with guidance to select models, feature selection techniques, and data balancing techniques that align with their specific safety objectives, offering a robust foundation for enhancing work-zone crash severity prediction.
Related papers
- Spatiotemporal Prediction of Secondary Crashes by Rebalancing Dynamic and Static Data with Generative Adversarial Networks [6.571659350175123]
Secondary crashes significantly exacerbate traffic congestion and increase the severity of incidents.
Existing methods fail to fully address the complexity of traffic crash data, particularly the coexistence of dynamic and static features.
This study proposes a hybrid model named VarFusiGAN-Transformer, aimed at improving the fidelity of secondary crash data generation.
arXiv Detail & Related papers (2025-01-17T08:56:49Z) - Enhancing Crash Frequency Modeling Based on Augmented Multi-Type Data by Hybrid VAE-Diffusion-Based Generative Neural Networks [13.402051372401822]
A key challenge in crash frequency modelling is the prevalence of excessive zero observations.
We propose a hybrid VAE-Diffusion neural network, designed to reduce zero observations.
We assess the synthetic data quality generated by this model through metrics like similarity, accuracy, diversity, and structural consistency.
arXiv Detail & Related papers (2025-01-17T07:53:27Z) - Learning Traffic Crashes as Language: Datasets, Benchmarks, and What-if Causal Analyses [76.59021017301127]
We propose a large-scale traffic crash language dataset, named CrashEvent, summarizing 19,340 real-world crash reports.
We further formulate the crash event feature learning as a novel text reasoning problem and further fine-tune various large language models (LLMs) to predict detailed accident outcomes.
Our experiments results show that our LLM-based approach not only predicts the severity of accidents but also classifies different types of accidents and predicts injury outcomes.
arXiv Detail & Related papers (2024-06-16T03:10:16Z) - Advancing the Robustness of Large Language Models through Self-Denoised Smoothing [50.54276872204319]
Large language models (LLMs) have achieved significant success, but their vulnerability to adversarial perturbations has raised considerable concerns.
We propose to leverage the multitasking nature of LLMs to first denoise the noisy inputs and then to make predictions based on these denoised versions.
Unlike previous denoised smoothing techniques in computer vision, which require training a separate model to enhance the robustness of LLMs, our method offers significantly better efficiency and flexibility.
arXiv Detail & Related papers (2024-04-18T15:47:00Z) - A Generative Deep Learning Approach for Crash Severity Modeling with Imbalanced Data [6.169163527464771]
This study proposes a crash data generation method based on Conditional Tabular GAN.
A crash severity model is employed to estimate the performance of classification and interpretation.
The results indicate that using synthetic data generated by CTGAN-RU for crash severity modeling outperforms original data or synthetic data generated by other resampling methods.
arXiv Detail & Related papers (2024-04-02T16:07:27Z) - Multi-class real-time crash risk forecasting using convolutional neural
network: Istanbul case study [0.0]
The performance of an artificial neural network (ANN) in forecasting crash risk is shown in this paper.
The proposed CNN model is capable of learning from recorded, processed, and categorized input characteristics.
The findings of this research suggest applying the CNN model as a multi-class prediction model for real-time crash risk prediction.
arXiv Detail & Related papers (2024-02-09T10:51:09Z) - The Risk of Federated Learning to Skew Fine-Tuning Features and
Underperform Out-of-Distribution Robustness [50.52507648690234]
Federated learning has the risk of skewing fine-tuning features and compromising the robustness of the model.
We introduce three robustness indicators and conduct experiments across diverse robust datasets.
Our approach markedly enhances the robustness across diverse scenarios, encompassing various parameter-efficient fine-tuning methods.
arXiv Detail & Related papers (2024-01-25T09:18:51Z) - Measuring Causal Effects of Data Statistics on Language Model's
`Factual' Predictions [59.284907093349425]
Large amounts of training data are one of the major reasons for the high performance of state-of-the-art NLP models.
We provide a language for describing how training data influences predictions, through a causal framework.
Our framework bypasses the need to retrain expensive models and allows us to estimate causal effects based on observational data alone.
arXiv Detail & Related papers (2022-07-28T17:36:24Z) - A model for traffic incident prediction using emergency braking data [77.34726150561087]
We address the fundamental problem of data scarcity in road traffic accident prediction by training our model on emergency braking events instead of accidents.
We present a prototype implementing a traffic incident prediction model for Germany based on emergency braking data from Mercedes-Benz vehicles.
arXiv Detail & Related papers (2021-02-12T18:17:12Z) - Learning while Respecting Privacy and Robustness to Distributional
Uncertainties and Adversarial Data [66.78671826743884]
The distributionally robust optimization framework is considered for training a parametric model.
The objective is to endow the trained model with robustness against adversarially manipulated input data.
Proposed algorithms offer robustness with little overhead.
arXiv Detail & Related papers (2020-07-07T18:25:25Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.