Enhancing Crash Frequency Modeling Based on Augmented Multi-Type Data by Hybrid VAE-Diffusion-Based Generative Neural Networks
- URL: http://arxiv.org/abs/2501.10017v1
- Date: Fri, 17 Jan 2025 07:53:27 GMT
- Title: Enhancing Crash Frequency Modeling Based on Augmented Multi-Type Data by Hybrid VAE-Diffusion-Based Generative Neural Networks
- Authors: Junlan Chen, Qijie He, Pei Liu, Wei Ma, Ziyuan Pu,
- Abstract summary: A key challenge in crash frequency modelling is the prevalence of excessive zero observations.
We propose a hybrid VAE-Diffusion neural network, designed to reduce zero observations.
We assess the synthetic data quality generated by this model through metrics like similarity, accuracy, diversity, and structural consistency.
- Score: 13.402051372401822
- License:
- Abstract: Crash frequency modelling analyzes the impact of factors like traffic volume, road geometry, and environmental conditions on crash occurrences. Inaccurate predictions can distort our understanding of these factors, leading to misguided policies and wasted resources, which jeopardize traffic safety. A key challenge in crash frequency modelling is the prevalence of excessive zero observations, caused by underreporting, the low probability of crashes, and high data collection costs. These zero observations often reduce model accuracy and introduce bias, complicating safety decision making. While existing approaches, such as statistical methods, data aggregation, and resampling, attempt to address this issue, they either rely on restrictive assumptions or result in significant information loss, distorting crash data. To overcome these limitations, we propose a hybrid VAE-Diffusion neural network, designed to reduce zero observations and handle the complexities of multi-type tabular crash data (count, ordinal, nominal, and real-valued variables). We assess the synthetic data quality generated by this model through metrics like similarity, accuracy, diversity, and structural consistency, and compare its predictive performance against traditional statistical models. Our findings demonstrate that the hybrid VAE-Diffusion model outperforms baseline models across all metrics, offering a more effective approach to augmenting crash data and improving the accuracy of crash frequency predictions. This study highlights the potential of synthetic data to enhance traffic safety by improving crash frequency modelling and informing better policy decisions.
Related papers
- Spatiotemporal Prediction of Secondary Crashes by Rebalancing Dynamic and Static Data with Generative Adversarial Networks [6.571659350175123]
Secondary crashes significantly exacerbate traffic congestion and increase the severity of incidents.
Existing methods fail to fully address the complexity of traffic crash data, particularly the coexistence of dynamic and static features.
This study proposes a hybrid model named VarFusiGAN-Transformer, aimed at improving the fidelity of secondary crash data generation.
arXiv Detail & Related papers (2025-01-17T08:56:49Z) - Feature Group Tabular Transformer: A Novel Approach to Traffic Crash Modeling and Causality Analysis [0.40964539027092917]
This study introduces a novel approach to predicting collision types by utilizing a comprehensive dataset fused from multiple sources.
Central to our approach is the development of a Feature Group Tabular Transformer (FGTT) model, which organizes disparate data into meaningful feature groups.
The FGTT model is benchmarked against widely used tree ensemble models, including Random Forest, XGBoost, and CatBoost, demonstrating superior predictive performance.
arXiv Detail & Related papers (2024-12-06T20:47:13Z) - Crash Severity Risk Modeling Strategies under Data Imbalance [7.9613232032536745]
This study investigates crash severity risk modeling strategies for work zones involving large vehicles when there are crash data imbalance between low-severity (LS) and high-severity (HS) crashes.
We utilized crash data, involving large vehicles in South Carolina work zones for the period between 2014 and 2018, which included 4 times more LS crashes compared to HS crashes.
The findings of this study highlight a disparity between LS and HS predictions, with less-accurate prediction of HS crashes compared to LS crashes due to class imbalance and feature overlaps between LS and HS crashes.
arXiv Detail & Related papers (2024-12-03T02:28:35Z) - Using Generative Models to Produce Realistic Populations of the United Kingdom Windstorms [0.0]
dissertation explores the application of generative models to produce realistic synthetic wind field data.
Three models, including standard GANs, WGAN-GP, and U-net diffusion models, were employed to generate wind maps of the UK.
The results reveal that while all models are effective in capturing the general spatial characteristics, each model exhibits distinct strengths and weaknesses.
arXiv Detail & Related papers (2024-09-16T19:53:33Z) - Learning Traffic Crashes as Language: Datasets, Benchmarks, and What-if Causal Analyses [76.59021017301127]
We propose a large-scale traffic crash language dataset, named CrashEvent, summarizing 19,340 real-world crash reports.
We further formulate the crash event feature learning as a novel text reasoning problem and further fine-tune various large language models (LLMs) to predict detailed accident outcomes.
Our experiments results show that our LLM-based approach not only predicts the severity of accidents but also classifies different types of accidents and predicts injury outcomes.
arXiv Detail & Related papers (2024-06-16T03:10:16Z) - A Generative Deep Learning Approach for Crash Severity Modeling with Imbalanced Data [6.169163527464771]
This study proposes a crash data generation method based on Conditional Tabular GAN.
A crash severity model is employed to estimate the performance of classification and interpretation.
The results indicate that using synthetic data generated by CTGAN-RU for crash severity modeling outperforms original data or synthetic data generated by other resampling methods.
arXiv Detail & Related papers (2024-04-02T16:07:27Z) - The Risk of Federated Learning to Skew Fine-Tuning Features and
Underperform Out-of-Distribution Robustness [50.52507648690234]
Federated learning has the risk of skewing fine-tuning features and compromising the robustness of the model.
We introduce three robustness indicators and conduct experiments across diverse robust datasets.
Our approach markedly enhances the robustness across diverse scenarios, encompassing various parameter-efficient fine-tuning methods.
arXiv Detail & Related papers (2024-01-25T09:18:51Z) - A model for traffic incident prediction using emergency braking data [77.34726150561087]
We address the fundamental problem of data scarcity in road traffic accident prediction by training our model on emergency braking events instead of accidents.
We present a prototype implementing a traffic incident prediction model for Germany based on emergency braking data from Mercedes-Benz vehicles.
arXiv Detail & Related papers (2021-02-12T18:17:12Z) - Firearm Detection via Convolutional Neural Networks: Comparing a
Semantic Segmentation Model Against End-to-End Solutions [68.8204255655161]
Threat detection of weapons and aggressive behavior from live video can be used for rapid detection and prevention of potentially deadly incidents.
One way for achieving this is through the use of artificial intelligence and, in particular, machine learning for image analysis.
We compare a traditional monolithic end-to-end deep learning model and a previously proposed model based on an ensemble of simpler neural networks detecting fire-weapons via semantic segmentation.
arXiv Detail & Related papers (2020-12-17T15:19:29Z) - Trust but Verify: Assigning Prediction Credibility by Counterfactual
Constrained Learning [123.3472310767721]
Prediction credibility measures are fundamental in statistics and machine learning.
These measures should account for the wide variety of models used in practice.
The framework developed in this work expresses the credibility as a risk-fit trade-off.
arXiv Detail & Related papers (2020-11-24T19:52:38Z) - Unlabelled Data Improves Bayesian Uncertainty Calibration under
Covariate Shift [100.52588638477862]
We develop an approximate Bayesian inference scheme based on posterior regularisation.
We demonstrate the utility of our method in the context of transferring prognostic models of prostate cancer across globally diverse populations.
arXiv Detail & Related papers (2020-06-26T13:50:19Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.