From Narratives to Probabilistic Reasoning: Predicting and Interpreting Drivers' Hazardous Actions in Crashes Using Large Language Model
- URL: http://arxiv.org/abs/2510.13002v1
- Date: Tue, 14 Oct 2025 21:35:47 GMT
- Title: From Narratives to Probabilistic Reasoning: Predicting and Interpreting Drivers' Hazardous Actions in Crashes Using Large Language Model
- Authors: Boyou Chen, Gerui Xu, Zifei Wang, Huizhong Guo, Ananna Ahmed, Zhaonan Sun, Zhen Hu, Kaihan Zhang, Shan Bao,
- Abstract summary: Two-vehicle crashes account for approximately 70% of roadway crashes.<n>Driver Hazardous Action (DHA) data is limited by inconsistent and labor-intensive manual coding practices.<n>Here, we present an innovative framework that leverages a fine-tuned large language model to automatically infer DHAs from textual crash narratives.
- Score: 3.3457493284891338
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Vehicle crashes involve complex interactions between road users, split-second decisions, and challenging environmental conditions. Among these, two-vehicle crashes are the most prevalent, accounting for approximately 70% of roadway crashes and posing a significant challenge to traffic safety. Identifying Driver Hazardous Action (DHA) is essential for understanding crash causation, yet the reliability of DHA data in large-scale databases is limited by inconsistent and labor-intensive manual coding practices. Here, we present an innovative framework that leverages a fine-tuned large language model to automatically infer DHAs from textual crash narratives, thereby improving the validity and interpretability of DHA classifications. Using five years of two-vehicle crash data from MTCF, we fine-tuned the Llama 3.2 1B model on detailed crash narratives and benchmarked its performance against conventional machine learning classifiers, including Random Forest, XGBoost, CatBoost, and a neural network. The fine-tuned LLM achieved an overall accuracy of 80%, surpassing all baseline models and demonstrating pronounced improvements in scenarios with imbalanced data. To increase interpretability, we developed a probabilistic reasoning approach, analyzing model output shifts across original test sets and three targeted counterfactual scenarios: variations in driver distraction and age. Our analysis revealed that introducing distraction for one driver substantially increased the likelihood of "General Unsafe Driving"; distraction for both drivers maximized the probability of "Both Drivers Took Hazardous Actions"; and assigning a teen driver markedly elevated the probability of "Speed and Stopping Violations." Our framework and analytical methods provide a robust and interpretable solution for large-scale automated DHA detection, offering new opportunities for traffic safety analysis and intervention.
Related papers
- ROAR: Robust Accident Recognition and Anticipation for Autonomous Driving [17.936492070548]
Existing methods often assume ideal conditions, overlooking challenges such as sensor failures, environmental disturbances, and data imperfections.<n>This study introduces ROAR, a novel approach for accident detection and prediction.<n> ROAR combines Discrete Wavelet Transform (DWT), a self adaptive object aware module, and dynamic focal loss to tackle these challenges.
arXiv Detail & Related papers (2025-11-09T04:55:37Z) - CoReVLA: A Dual-Stage End-to-End Autonomous Driving Framework for Long-Tail Scenarios via Collect-and-Refine [73.74077186298523]
CoReVLA is a continual learning framework for autonomous driving.<n>It improves the performance in long-tail scenarios through a dual-stage process of data Collection and behavior Refinement.<n>CoReVLA achieves a Driving Score (DS) of 72.18 and a Success Rate (SR) of 50%, outperforming state-of-the-art methods by 7.96 DS and 15% SR under long-tail, safety-critical scenarios.
arXiv Detail & Related papers (2025-09-19T13:25:56Z) - Predicting and Explaining Traffic Crash Severity Through Crash Feature Selection [1.0941365324532635]
This study introduces a curated dataset of more than 3 million people involved in accidents in Ohio over six years-2022.<n>The primary contribution is a transparent and reproducible methodology that combines Automated Machine Learning (AutoML) and explainable artificial intelligence (AI) to identify and interpret key risk factors associated with severe crashes.<n>Key features spanned demographic, environmental, vehicle, human and operational categories, including location type posted speed, minimum occupant age, and pre-crash action.
arXiv Detail & Related papers (2025-08-15T14:31:26Z) - Overtake Detection in Trucks Using CAN Bus Signals: A Comparative Study of Machine Learning Methods [51.28632782308621]
We focus on overtake detection using Controller Area Network (CAN) bus data collected from five in-service trucks provided by the Volvo Group.<n>We evaluate three common classifiers for vehicle manoeuvre detection, Artificial Neural Networks (ANN), Random Forest (RF), and Support Vector Machines (SVM)<n>Our pertruck analysis also reveals that classification accuracy, especially for overtakes, depends on the amount of training data per vehicle.
arXiv Detail & Related papers (2025-07-01T09:20:41Z) - Towards Reliable and Interpretable Traffic Crash Pattern Prediction and Safety Interventions Using Customized Large Language Models [14.53510262691888]
TrafficSafe is a framework that adapts to reframe crash prediction and feature attribution as text-level reasoning.<n>Alcohol-impaired driving is the leading factor in severe crashes.<n>TrafficSafe highlights pivotal features during model training guiding strategic crash data collection improvements.
arXiv Detail & Related papers (2025-05-18T21:02:30Z) - Advanced Crash Causation Analysis for Freeway Safety: A Large Language Model Approach to Identifying Key Contributing Factors [0.0]
This research leverages large language model (LLM) to analyze freeway crash data and provide crash causation analysis accordingly.<n>The fine-tuned Llama3 8B model was then used to identify crash causation without pre-labeled data through zero-shot classification.<n>Results demonstrate that LLMs effectively identify primary crash causes such as alcohol-impaired driving, speeding, aggressive driving, and driver inattention.
arXiv Detail & Related papers (2025-05-15T04:07:55Z) - NsBM-GAT: A Non-stationary Block Maximum and Graph Attention Framework for General Traffic Crash Risk Prediction [11.444259609536164]
Existing crash risk prediction models rely on hypothetical scenarios deemed dangerous by researchers.<n>Dashcam videos capture the pre-crash behavior of individual vehicles, but they often lack critical information about the movements of surrounding vehicles.<n>We propose a novel non-stationary extreme value theory (EVT) to capture the interactive behavior between a vehicle and its surrounding vehicles.
arXiv Detail & Related papers (2025-03-06T02:12:40Z) - SafeAuto: Knowledge-Enhanced Safe Autonomous Driving with Multimodal Foundation Models [63.71984266104757]
We propose SafeAuto, a framework that enhances MLLM-based autonomous driving by incorporating both unstructured and structured knowledge.<n>To explicitly integrate safety knowledge, we develop a reasoning component that translates traffic rules into first-order logic.<n>Our Multimodal Retrieval-Augmented Generation model leverages video, control signals, and environmental attributes to learn from past driving experiences.
arXiv Detail & Related papers (2025-02-28T21:53:47Z) - A Multi-Loss Strategy for Vehicle Trajectory Prediction: Combining Off-Road, Diversity, and Directional Consistency Losses [68.68514648185828]
Trajectory prediction is essential for the safety and efficiency of planning in autonomous vehicles.<n>Current models often fail to fully capture complex traffic rules and the complete range of potential vehicle movements.<n>This study introduces three novel loss functions: Offroad Loss, Direction Consistency Error, and Diversity Loss.
arXiv Detail & Related papers (2024-11-29T14:47:08Z) - Learning Traffic Crashes as Language: Datasets, Benchmarks, and What-if Causal Analyses [76.59021017301127]
We propose a large-scale traffic crash language dataset, named CrashEvent, summarizing 19,340 real-world crash reports.
We further formulate the crash event feature learning as a novel text reasoning problem and further fine-tune various large language models (LLMs) to predict detailed accident outcomes.
Our experiments results show that our LLM-based approach not only predicts the severity of accidents but also classifies different types of accidents and predicts injury outcomes.
arXiv Detail & Related papers (2024-06-16T03:10:16Z) - A model for traffic incident prediction using emergency braking data [77.34726150561087]
We address the fundamental problem of data scarcity in road traffic accident prediction by training our model on emergency braking events instead of accidents.
We present a prototype implementing a traffic incident prediction model for Germany based on emergency braking data from Mercedes-Benz vehicles.
arXiv Detail & Related papers (2021-02-12T18:17:12Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.