Related papers: From Narratives to Probabilistic Reasoning: Predicting and Interpreting Drivers' Hazardous Actions in Crashes Using Large Language Model

From Narratives to Probabilistic Reasoning: Predicting and Interpreting Drivers' Hazardous Actions in Crashes Using Large Language Model

URL: http://arxiv.org/abs/2510.13002v1
Date: Tue, 14 Oct 2025 21:35:47 GMT
Title: From Narratives to Probabilistic Reasoning: Predicting and Interpreting Drivers' Hazardous Actions in Crashes Using Large Language Model
Authors: Boyou Chen, Gerui Xu, Zifei Wang, Huizhong Guo, Ananna Ahmed, Zhaonan Sun, Zhen Hu, Kaihan Zhang, Shan Bao,
Abstract summary: Two-vehicle crashes account for approximately 70% of roadway crashes.<n>Driver Hazardous Action (DHA) data is limited by inconsistent and labor-intensive manual coding practices.<n>Here, we present an innovative framework that leverages a fine-tuned large language model to automatically infer DHAs from textual crash narratives.
Score: 3.3457493284891338
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Vehicle crashes involve complex interactions between road users, split-second decisions, and challenging environmental conditions. Among these, two-vehicle crashes are the most prevalent, accounting for approximately 70% of roadway crashes and posing a significant challenge to traffic safety. Identifying Driver Hazardous Action (DHA) is essential for understanding crash causation, yet the reliability of DHA data in large-scale databases is limited by inconsistent and labor-intensive manual coding practices. Here, we present an innovative framework that leverages a fine-tuned large language model to automatically infer DHAs from textual crash narratives, thereby improving the validity and interpretability of DHA classifications. Using five years of two-vehicle crash data from MTCF, we fine-tuned the Llama 3.2 1B model on detailed crash narratives and benchmarked its performance against conventional machine learning classifiers, including Random Forest, XGBoost, CatBoost, and a neural network. The fine-tuned LLM achieved an overall accuracy of 80%, surpassing all baseline models and demonstrating pronounced improvements in scenarios with imbalanced data. To increase interpretability, we developed a probabilistic reasoning approach, analyzing model output shifts across original test sets and three targeted counterfactual scenarios: variations in driver distraction and age. Our analysis revealed that introducing distraction for one driver substantially increased the likelihood of "General Unsafe Driving"; distraction for both drivers maximized the probability of "Both Drivers Took Hazardous Actions"; and assigning a teen driver markedly elevated the probability of "Speed and Stopping Violations." Our framework and analytical methods provide a robust and interpretable solution for large-scale automated DHA detection, offering new opportunities for traffic safety analysis and intervention.

Related papers

ROAR: Robust Accident Recognition and Anticipation for Autonomous Driving [17.936492070548]
Existing methods often assume ideal conditions, overlooking challenges such as sensor failures, environmental disturbances, and data imperfections.<n>This study introduces ROAR, a novel approach for accident detection and prediction.<n> ROAR combines Discrete Wavelet Transform (DWT), a self adaptive object aware module, and dynamic focal loss to tackle these challenges.
arXiv Detail & Related papers (2025-11-09T04:55:37Z)
CoReVLA: A Dual-Stage End-to-End Autonomous Driving Framework for Long-Tail Scenarios via Collect-and-Refine [73.74077186298523]
CoReVLA is a continual learning framework for autonomous driving.<n>It improves the performance in long-tail scenarios through a dual-stage process of data Collection and behavior Refinement.<n>CoReVLA achieves a Driving Score (DS) of 72.18 and a Success Rate (SR) of 50%, outperforming state-of-the-art methods by 7.96 DS and 15% SR under long-tail, safety-critical scenarios.
arXiv Detail & Related papers (2025-09-19T13:25:56Z)
Predicting and Explaining Traffic Crash Severity Through Crash Feature Selection [1.0941365324532635]
This study introduces a curated dataset of more than 3 million people involved in accidents in Ohio over six years-2022.<n>The primary contribution is a transparent and reproducible methodology that combines Automated Machine Learning (AutoML) and explainable artificial intelligence (AI) to identify and interpret key risk factors associated with severe crashes.<n>Key features spanned demographic, environmental, vehicle, human and operational categories, including location type posted speed, minimum occupant age, and pre-crash action.
arXiv Detail & Related papers (2025-08-15T14:31:26Z)
Overtake Detection in Trucks Using CAN Bus Signals: A Comparative Study of Machine Learning Methods [51.28632782308621]
We focus on overtake detection using Controller Area Network (CAN) bus data collected from five in-service trucks provided by the Volvo Group.<n>We evaluate three common classifiers for vehicle manoeuvre detection, Artificial Neural Networks (ANN), Random Forest (RF), and Support Vector Machines (SVM)<n>Our pertruck analysis also reveals that classification accuracy, especially for overtakes, depends on the amount of training data per vehicle.
arXiv Detail & Related papers (2025-07-01T09:20:41Z)
Towards Reliable and Interpretable Traffic Crash Pattern Prediction and Safety Interventions Using Customized Large Language Models [14.53510262691888]
TrafficSafe is a framework that adapts to reframe crash prediction and feature attribution as text-level reasoning.<n>Alcohol-impaired driving is the leading factor in severe crashes.<n>TrafficSafe highlights pivotal features during model training guiding strategic crash data collection improvements.
arXiv Detail & Related papers (2025-05-18T21:02:30Z)
Advanced Crash Causation Analysis for Freeway Safety: A Large Language Model Approach to Identifying Key Contributing Factors [0.0]
This research leverages large language model (LLM) to analyze freeway crash data and provide crash causation analysis accordingly.<n>The fine-tuned Llama3 8B model was then used to identify crash causation without pre-labeled data through zero-shot classification.<n>Results demonstrate that LLMs effectively identify primary crash causes such as alcohol-impaired driving, speeding, aggressive driving, and driver inattention.
arXiv Detail & Related papers (2025-05-15T04:07:55Z)
NsBM-GAT: A Non-stationary Block Maximum and Graph Attention Framework for General Traffic Crash Risk Prediction [11.444259609536164]
Existing crash risk prediction models rely on hypothetical scenarios deemed dangerous by researchers.<n>Dashcam videos capture the pre-crash behavior of individual vehicles, but they often lack critical information about the movements of surrounding vehicles.<n>We propose a novel non-stationary extreme value theory (EVT) to capture the interactive behavior between a vehicle and its surrounding vehicles.
arXiv Detail & Related papers (2025-03-06T02:12:40Z)
SafeAuto: Knowledge-Enhanced Safe Autonomous Driving with Multimodal Foundation Models [63.71984266104757]
We propose SafeAuto, a framework that enhances MLLM-based autonomous driving by incorporating both unstructured and structured knowledge.<n>To explicitly integrate safety knowledge, we develop a reasoning component that translates traffic rules into first-order logic.<n>Our Multimodal Retrieval-Augmented Generation model leverages video, control signals, and environmental attributes to learn from past driving experiences.
arXiv Detail & Related papers (2025-02-28T21:53:47Z)
A Multi-Loss Strategy for Vehicle Trajectory Prediction: Combining Off-Road, Diversity, and Directional Consistency Losses [68.68514648185828]
Trajectory prediction is essential for the safety and efficiency of planning in autonomous vehicles.<n>Current models often fail to fully capture complex traffic rules and the complete range of potential vehicle movements.<n>This study introduces three novel loss functions: Offroad Loss, Direction Consistency Error, and Diversity Loss.
arXiv Detail & Related papers (2024-11-29T14:47:08Z)
Learning Traffic Crashes as Language: Datasets, Benchmarks, and What-if Causal Analyses [76.59021017301127]
We propose a large-scale traffic crash language dataset, named CrashEvent, summarizing 19,340 real-world crash reports. We further formulate the crash event feature learning as a novel text reasoning problem and further fine-tune various large language models (LLMs) to predict detailed accident outcomes. Our experiments results show that our LLM-based approach not only predicts the severity of accidents but also classifies different types of accidents and predicts injury outcomes.
arXiv Detail & Related papers (2024-06-16T03:10:16Z)
A model for traffic incident prediction using emergency braking data [77.34726150561087]
We address the fundamental problem of data scarcity in road traffic accident prediction by training our model on emergency braking events instead of accidents. We present a prototype implementing a traffic incident prediction model for Germany based on emergency braking data from Mercedes-Benz vehicles.
arXiv Detail & Related papers (2021-02-12T18:17:12Z)

This list is automatically generated from the titles and abstracts of the papers in this site.