Non-Linear Determinants of Pedestrian Injury Severity: Evidence from Administrative Data in Great Britain
- URL: http://arxiv.org/abs/2512.04022v1
- Date: Wed, 03 Dec 2025 17:59:46 GMT
- Title: Non-Linear Determinants of Pedestrian Injury Severity: Evidence from Administrative Data in Great Britain
- Authors: Yifei Tong,
- Abstract summary: This study investigates the non-linear determinants of pedestrian injury severity using administrative data from Great Britain's 2023 STATS19 dataset.<n>We employ a rigorous preprocessing pipeline utilizing mode imputation and Synthetic Minority Over-sampling (SMOTE)<n>Our analysis reveals that vehicle count, speed limits, lighting, and road surface conditions are the primary predictors of severity.
- Score: 2.28438857884398
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: This study investigates the non-linear determinants of pedestrian injury severity using administrative data from Great Britain's 2023 STATS19 dataset. To address inherent data-quality challenges, including missing information and substantial class imbalance, we employ a rigorous preprocessing pipeline utilizing mode imputation and Synthetic Minority Over-sampling (SMOTE). We utilize non-parametric ensemble methods (Random Forest and XGBoost) to capture complex interactions and heterogeneity often missed by linear models, while Shapley Additive Explanations are employed to ensure interpretability and isolate marginal feature effects. Our analysis reveals that vehicle count, speed limits, lighting, and road surface conditions are the primary predictors of severity, with police attendance and junction characteristics further distinguishing severe collisions. Spatially, while pedestrian risk is concentrated in dense urban Local Authority Districts (LADs), we identify that certain rural LADs experience disproportionately severe outcomes conditional on a collision occurring. These findings underscore the value of combining spatial analysis with interpretable machine learning to guide geographically targeted speed management, infrastructure investment, and enforcement strategies.
Related papers
- Measuring Nonlinear Relationships and Spatial Heterogeneity of Influencing Factors on Traffic Crash Density Using GeoXAI [9.885953349638173]
This study applies a Geospatial Explainable AI (GeoXAI) framework to analyze the spatially heterogeneous and nonlinear determinants of traffic crash density in Florida.<n>Results show that variables such as road density, intersection density, neighborhood compactness, and educational attainment exhibit complex nonlinear relationships with crashes.
arXiv Detail & Related papers (2025-12-17T00:42:52Z) - Isolation-based Spherical Ensemble Representations for Anomaly Detection [60.989157958972356]
Anomaly detection is a critical task in data mining and management with applications spanning fraud detection, network security, and log monitoring.<n>Existing unsupervised anomaly detection methods face fundamental challenges including conflicting distributional assumptions, computational inefficiency, and difficulty handling different anomaly types.<n>We propose ISER (Isolation-based Spherical Ensemble Representations) that extends existing isolation-based methods by using hypersphere radii as proxies for local density characteristics while maintaining linear time and constant space complexity.
arXiv Detail & Related papers (2025-10-15T09:00:05Z) - Spatial Association Between Near-Misses and Accident Blackspots in Sydney, Australia: A Getis-Ord $G_i^*$ Analysis [3.0928226965455154]
The proliferation of vehicle telematics presents an opportunity for a paradigm shift towards proactive safety.<n>This paper presents a spatial-statistical framework to analyze the concordance and discordance between official crash records and near-miss events.<n>The results provide a data-driven methodology for transport authorities to transition from a reactive to a proactive safety management strategy.
arXiv Detail & Related papers (2025-06-03T19:58:56Z) - Inverse Reinforcement Learning for Minimum-Exposure Paths in Spatiotemporally Varying Scalar Fields [49.1574468325115]
We consider a problem of synthesizing datasets of minimum exposure paths that resemble a training dataset of such paths.<n>The main contribution of this paper is an inverse reinforcement learning (IRL) model to solve this problem.<n>We find that the proposed IRL model provides excellent performance in synthesizing paths from initial conditions not seen in the training dataset.
arXiv Detail & Related papers (2025-03-09T13:30:11Z) - Traffic and Safety Rule Compliance of Humans in Diverse Driving Situations [48.924085579865334]
Analyzing human data is crucial for developing autonomous systems that replicate safe driving practices.
This paper presents a comparative evaluation of human compliance with traffic and safety rules across multiple trajectory prediction datasets.
arXiv Detail & Related papers (2024-11-04T09:21:00Z) - Urban Traffic Accident Risk Prediction Revisited: Regionality, Proximity, Similarity and Sparsity [18.566139471849844]
Traffic accidents pose a significant risk to human health and property safety.
To prevent traffic accidents, predicting their risks has garnered growing interest.
We argue that a desired prediction solution should demonstrate resilience to the complexity of traffic accidents.
arXiv Detail & Related papers (2024-07-29T03:10:15Z) - Heterogeneous Graph Neural Networks with Post-hoc Explanations for Multi-modal and Explainable Land Use Inference [11.753345219488745]
We introduce an explainable framework for inferring land use that synergises heterogeneous graph neural networks (HGNs) with Explainable AI techniques.
Experiments demonstrate that the proposed HGNs significantly outperform baseline graph neural networks for all six land-use indicators.
These analyses demonstrate that the proposed HGNs can suitably support urban stakeholders in their urban planning and policy-making.
arXiv Detail & Related papers (2024-06-19T17:39:10Z) - Learning Traffic Crashes as Language: Datasets, Benchmarks, and What-if Causal Analyses [76.59021017301127]
We propose a large-scale traffic crash language dataset, named CrashEvent, summarizing 19,340 real-world crash reports.
We further formulate the crash event feature learning as a novel text reasoning problem and further fine-tune various large language models (LLMs) to predict detailed accident outcomes.
Our experiments results show that our LLM-based approach not only predicts the severity of accidents but also classifies different types of accidents and predicts injury outcomes.
arXiv Detail & Related papers (2024-06-16T03:10:16Z) - Measuring Adversarial Datasets [28.221635644616523]
Researchers have curated various adversarial datasets for capturing model deficiencies that cannot be revealed in standard benchmark datasets.
There is still no methodology to measure the intended and unintended consequences of those adversarial transformations.
We conducted a systematic survey of existing quantifiable metrics that describe text instances in NLP tasks.
arXiv Detail & Related papers (2023-11-06T22:08:16Z) - Seeing is not Believing: Robust Reinforcement Learning against Spurious
Correlation [57.351098530477124]
We consider one critical type of robustness against spurious correlation, where different portions of the state do not have correlations induced by unobserved confounders.
A model that learns such useless or even harmful correlation could catastrophically fail when the confounder in the test case deviates from the training one.
Existing robust algorithms that assume simple and unstructured uncertainty sets are therefore inadequate to address this challenge.
arXiv Detail & Related papers (2023-07-15T23:53:37Z) - Novel features for the detection of bearing faults in railway vehicles [88.89591720652352]
We introduce Mel-Frequency Cepstral Coefficients (MFCCs) and features extracted from the Amplitude Modulation Spectrogram (AMS) as features for the detection of bearing faults.
arXiv Detail & Related papers (2023-04-14T10:09:50Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.