Data-Driven Analysis of Crash Patterns in SAE Level 2 and Level 4 Automated Vehicles Using K-means Clustering and Association Rule Mining
- URL: http://arxiv.org/abs/2512.22589v2
- Date: Fri, 02 Jan 2026 16:28:22 GMT
- Title: Data-Driven Analysis of Crash Patterns in SAE Level 2 and Level 4 Automated Vehicles Using K-means Clustering and Association Rule Mining
- Authors: Jewel Rana Palit, Vijayalakshmi K Kumarasamy, Osama A. Osman,
- Abstract summary: Automated Vehicles (AV) hold potential to reduce or eliminate human driving errors, enhance traffic safety, and support sustainable mobility.<n>Recently, crash data has increasingly revealed that AV behavior can deviate from expected safety outcomes, raising concerns about the technology's safety and operational reliability in mixed traffic environments.<n>This study analyzes over 2,500 AV crash records from the United States National Highway Traffic Safety Administration (NHTSA), covering SAE Levels 2 and 4 to uncover underlying crash dynamics.
- Score: 0.17205106391379021
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Automated Vehicles (AV) hold potential to reduce or eliminate human driving errors, enhance traffic safety, and support sustainable mobility. Recently, crash data has increasingly revealed that AV behavior can deviate from expected safety outcomes, raising concerns about the technology's safety and operational reliability in mixed traffic environments. While past research has investigated AV crash, most studies rely on small-size California-centered datasets, with a limited focus on understanding crash trends across various SAE Levels of automation. This study analyzes over 2,500 AV crash records from the United States National Highway Traffic Safety Administration (NHTSA), covering SAE Levels 2 and 4, to uncover underlying crash dynamics. A two-stage data mining framework is developed. K-means clustering is first applied to segment crash records into 4 distinct behavioral clusters based on temporal, spatial, and environmental factors. Then, Association Rule Mining (ARM) is used to extract interpretable multivariate relationships between crash patterns and crash contributors including lighting conditions, surface condition, vehicle dynamics, and environmental conditions within each cluster. These insights provide actionable guidance for AV developers, safety regulators, and policymakers in formulating AV deployment strategies and minimizing crash risks.
Related papers
- SAVeD: A First-Person Social Media Video Dataset for ADAS-equipped vehicle Near-Miss and Crash Event Analyses [0.7874708385247353]
This paper introduces SAVeD, a large-scale video dataset curated from publicly available social media content.<n>SAVED features 2,119 first-person videos, capturing ADAS vehicle operations in diverse locations, lighting conditions, and weather scenarios.<n>The dataset includes video frame-level annotations for collisions, evasive maneuvers, and disengagements, enabling analysis of both perception and decision-making failures.
arXiv Detail & Related papers (2025-12-19T15:58:52Z) - From Narratives to Probabilistic Reasoning: Predicting and Interpreting Drivers' Hazardous Actions in Crashes Using Large Language Model [3.3457493284891338]
Two-vehicle crashes account for approximately 70% of roadway crashes.<n>Driver Hazardous Action (DHA) data is limited by inconsistent and labor-intensive manual coding practices.<n>Here, we present an innovative framework that leverages a fine-tuned large language model to automatically infer DHAs from textual crash narratives.
arXiv Detail & Related papers (2025-10-14T21:35:47Z) - Overtake Detection in Trucks Using CAN Bus Signals: A Comparative Study of Machine Learning Methods [51.28632782308621]
We focus on overtake detection using Controller Area Network (CAN) bus data collected from five in-service trucks provided by the Volvo Group.<n>We evaluate three common classifiers for vehicle manoeuvre detection, Artificial Neural Networks (ANN), Random Forest (RF), and Support Vector Machines (SVM)<n>Our pertruck analysis also reveals that classification accuracy, especially for overtakes, depends on the amount of training data per vehicle.
arXiv Detail & Related papers (2025-07-01T09:20:41Z) - Learning collision risk proactively from naturalistic driving data at scale [3.1457219084519004]
This study introduces the Generalised Surrogate Safety Measure (GSSM)<n>GSSM learns collision risk from naturalistic driving without the need for crash or risk labels.<n> Diverse data from naturalistic driving, including motion kinematics, weather, lighting, etc., are used to train multiple GSSMs.<n>A basic GSSM using only instantaneous motion kinematics achieves an area under the precision-recall curve of 0.9 and secures a median time advance of 2.6 seconds to prevent potential collisions.
arXiv Detail & Related papers (2025-05-19T07:22:32Z) - Towards Reliable and Interpretable Traffic Crash Pattern Prediction and Safety Interventions Using Customized Large Language Models [14.53510262691888]
TrafficSafe is a framework that adapts to reframe crash prediction and feature attribution as text-level reasoning.<n>Alcohol-impaired driving is the leading factor in severe crashes.<n>TrafficSafe highlights pivotal features during model training guiding strategic crash data collection improvements.
arXiv Detail & Related papers (2025-05-18T21:02:30Z) - Traffic and Safety Rule Compliance of Humans in Diverse Driving Situations [48.924085579865334]
Analyzing human data is crucial for developing autonomous systems that replicate safe driving practices.
This paper presents a comparative evaluation of human compliance with traffic and safety rules across multiple trajectory prediction datasets.
arXiv Detail & Related papers (2024-11-04T09:21:00Z) - Learning Traffic Crashes as Language: Datasets, Benchmarks, and What-if Causal Analyses [76.59021017301127]
We propose a large-scale traffic crash language dataset, named CrashEvent, summarizing 19,340 real-world crash reports.
We further formulate the crash event feature learning as a novel text reasoning problem and further fine-tune various large language models (LLMs) to predict detailed accident outcomes.
Our experiments results show that our LLM-based approach not only predicts the severity of accidents but also classifies different types of accidents and predicts injury outcomes.
arXiv Detail & Related papers (2024-06-16T03:10:16Z) - Vehicle-group-based Crash Risk Prediction and Interpretation on Highways [8.703173025279431]
This study investigates a new vehicle group based risk analysis method and explores risk evolution mechanisms considering VG features.<n>An impact-based vehicle grouping method is proposed to cluster vehicles into VGs by evaluating their responses to the erratic behaviors of nearby vehicles.<n>A Logistic Regression and a Graph Neural Network (GNN) are then employed to predict VG risks using aggregated and disaggregated VG information.
arXiv Detail & Related papers (2024-02-19T07:47:23Z) - A Counterfactual Safety Margin Perspective on the Scoring of Autonomous
Vehicles' Riskiness [52.27309191283943]
This paper presents a data-driven framework for assessing the risk of different AVs' behaviors.
We propose the notion of counterfactual safety margin, which represents the minimum deviation from nominal behavior that could cause a collision.
arXiv Detail & Related papers (2023-08-02T09:48:08Z) - DeepAccident: A Motion and Accident Prediction Benchmark for V2X
Autonomous Driving [76.29141888408265]
We propose a large-scale dataset containing diverse accident scenarios that frequently occur in real-world driving.
The proposed DeepAccident dataset includes 57K annotated frames and 285K annotated samples, approximately 7 times more than the large-scale nuScenes dataset.
arXiv Detail & Related papers (2023-04-03T17:37:00Z) - Cautious Adaptation For Reinforcement Learning in Safety-Critical
Settings [129.80279257258098]
Reinforcement learning (RL) in real-world safety-critical target settings like urban driving is hazardous.
We propose a "safety-critical adaptation" task setting: an agent first trains in non-safety-critical "source" environments.
We propose a solution approach, CARL, that builds on the intuition that prior experience in diverse environments equips an agent to estimate risk.
arXiv Detail & Related papers (2020-08-15T01:40:59Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.