Large Language Models for Pedestrian Safety: An Application to Predicting Driver Yielding Behavior at Unsignalized Intersections
- URL: http://arxiv.org/abs/2509.19657v1
- Date: Wed, 24 Sep 2025 00:25:19 GMT
- Title: Large Language Models for Pedestrian Safety: An Application to Predicting Driver Yielding Behavior at Unsignalized Intersections
- Authors: Yicheng Yang, Zixian Li, Jean Paul Bizimana, Niaz Zafri, Yongfeng Dong, Tianyi Li,
- Abstract summary: Large language models (LLMs) are suited for extracting patterns from heterogeneous traffic data, enabling accurate modeling of driver-pedestrian interactions.<n>This paper benchmarks state-of-the-art LLMs against traditional classifiers, finding that GPT-4o consistently achieves the highest accuracy and recall, while Deepseek-V3 excels in precision.
- Score: 5.913801021011149
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Pedestrian safety is a critical component of urban mobility and is strongly influenced by the interactions between pedestrian decision-making and driver yielding behavior at crosswalks. Modeling driver--pedestrian interactions at intersections requires accurately capturing the complexity of these behaviors. Traditional machine learning models often struggle to capture the nuanced and context-dependent reasoning required for these multifactorial interactions, due to their reliance on fixed feature representations and limited interpretability. In contrast, large language models (LLMs) are suited for extracting patterns from heterogeneous traffic data, enabling accurate modeling of driver-pedestrian interactions. Therefore, this paper leverages multimodal LLMs through a novel prompt design that incorporates domain-specific knowledge, structured reasoning, and few-shot prompting, enabling interpretable and context-aware inference of driver yielding behavior, as an example application of modeling pedestrian--driver interaction. We benchmarked state-of-the-art LLMs against traditional classifiers, finding that GPT-4o consistently achieves the highest accuracy and recall, while Deepseek-V3 excels in precision. These findings highlight the critical trade-offs between model performance and computational efficiency, offering practical guidance for deploying LLMs in real-world pedestrian safety systems.
Related papers
- Discrete Diffusion for Reflective Vision-Language-Action Models in Autonomous Driving [55.13109926181247]
We introduce ReflectDrive, a learning-based framework that integrates a reflection mechanism for safe trajectory generation via discrete diffusion.<n>Central to our approach is a safety-aware reflection mechanism that performs iterative self-correction without gradient.<n>Our method begins with goal-conditioned trajectory generation to model multi-modal driving behaviors.
arXiv Detail & Related papers (2025-09-24T13:35:15Z) - Game-Theoretic Modeling of Vehicle Unprotected Left Turns Considering Drivers' Bounded Rationality [17.5324678856791]
We propose a novel decision-making model for vehicle unprotected left-turn scenarios.<n>Our model integrates game theory with considerations for drivers' bounded rationality.<n>Our findings contribute valuable insights into the vehicle decision-making behaviors with bounded rationality.
arXiv Detail & Related papers (2025-07-02T02:22:11Z) - Markov Regime-Switching Intelligent Driver Model for Interpretable Car-Following Behavior [19.229274803939983]
We introduce a regime-switching framework that allows driving behavior to be governed by different IDM parameter sets.<n>We instantiate the framework using a Factorial Hidden Markov Model with IDM dynamics.
arXiv Detail & Related papers (2025-06-17T17:55:42Z) - SafeAuto: Knowledge-Enhanced Safe Autonomous Driving with Multimodal Foundation Models [63.71984266104757]
We propose SafeAuto, a framework that enhances MLLM-based autonomous driving by incorporating both unstructured and structured knowledge.<n>To explicitly integrate safety knowledge, we develop a reasoning component that translates traffic rules into first-order logic.<n>Our Multimodal Retrieval-Augmented Generation model leverages video, control signals, and environmental attributes to learn from past driving experiences.
arXiv Detail & Related papers (2025-02-28T21:53:47Z) - FollowGen: A Scaled Noise Conditional Diffusion Model for Car-Following Trajectory Prediction [9.2729178775419]
This study introduces a scaled noise conditional diffusion model for car-following trajectory prediction.
It integrates detailed inter-vehicular interactions and car-following dynamics into a generative framework, improving the accuracy and plausibility of predicted trajectories.
Experimental results on diverse real-world driving scenarios demonstrate the state-of-the-art performance and robustness of the proposed method.
arXiv Detail & Related papers (2024-11-23T23:13:45Z) - Hints of Prompt: Enhancing Visual Representation for Multimodal LLMs in Autonomous Driving [65.04643267731122]
General MLLMs combined with CLIP often struggle to represent driving-specific scenarios accurately.
We propose the Hints of Prompt (HoP) framework, which introduces three key enhancements.
These hints are fused through a Hint Fusion module, enriching visual representations and enhancing multimodal reasoning.
arXiv Detail & Related papers (2024-11-20T06:58:33Z) - Hypergraph-based Motion Generation with Multi-modal Interaction Relational Reasoning [13.294396870431399]
Real-world driving environments are characterized by dynamic and diverse interactions among vehicles.<n>This research introduces an integrated framework for autonomous vehicles (AVs) motion prediction.<n>The framework integrates a multi-scale hypergraph neural network to model group-wise interactions among vehicles.
arXiv Detail & Related papers (2024-09-18T03:30:38Z) - GenFollower: Enhancing Car-Following Prediction with Large Language Models [11.847589952558566]
We propose GenFollower, a novel zero-shot prompting approach that leverages large language models (LLMs) to address these challenges.
We reframe car-following behavior as a language modeling problem and integrate heterogeneous inputs into structured prompts for LLMs.
Experiments on Open datasets demonstrate GenFollower's superior performance and ability to provide interpretable insights.
arXiv Detail & Related papers (2024-07-08T04:54:42Z) - InferAligner: Inference-Time Alignment for Harmlessness through
Cross-Model Guidance [56.184255657175335]
We develop textbfInferAligner, a novel inference-time alignment method that utilizes cross-model guidance for harmlessness alignment.
Experimental results show that our method can be very effectively applied to domain-specific models in finance, medicine, and mathematics.
It significantly diminishes the Attack Success Rate (ASR) of both harmful instructions and jailbreak attacks, while maintaining almost unchanged performance in downstream tasks.
arXiv Detail & Related papers (2024-01-20T10:41:03Z) - Interactive Autonomous Navigation with Internal State Inference and
Interactivity Estimation [58.21683603243387]
We propose three auxiliary tasks with relational-temporal reasoning and integrate them into the standard Deep Learning framework.
These auxiliary tasks provide additional supervision signals to infer the behavior patterns other interactive agents.
Our approach achieves robust and state-of-the-art performance in terms of standard evaluation metrics.
arXiv Detail & Related papers (2023-11-27T18:57:42Z) - MTR++: Multi-Agent Motion Prediction with Symmetric Scene Modeling and
Guided Intention Querying [110.83590008788745]
Motion prediction is crucial for autonomous driving systems to understand complex driving scenarios and make informed decisions.
In this paper, we propose Motion TRansformer (MTR) frameworks to address these challenges.
The initial MTR framework utilizes a transformer encoder-decoder structure with learnable intention queries.
We introduce an advanced MTR++ framework, extending the capability of MTR to simultaneously predict multimodal motion for multiple agents.
arXiv Detail & Related papers (2023-06-30T16:23:04Z) - Multi-intersection Traffic Optimisation: A Benchmark Dataset and a
Strong Baseline [85.9210953301628]
Control of traffic signals is fundamental and critical to alleviate traffic congestion in urban areas.
Because of the high complexity of modelling the problem, experimental settings of current works are often inconsistent.
We propose a novel and strong baseline model based on deep reinforcement learning with the encoder-decoder structure.
arXiv Detail & Related papers (2021-01-24T03:55:39Z) - IDE-Net: Interactive Driving Event and Pattern Extraction from Human
Data [35.473428772961235]
We propose the Interactive Driving event and pattern Extraction Network (IDE-Net) to automatically extract interaction events and patterns.
IDE-Net is a deep learning framework to automatically extract events and patterns directly from vehicle trajectories.
We find three interpretable patterns of interactions, bringing insights for driver behavior representation, modeling and comprehension.
arXiv Detail & Related papers (2020-11-04T16:56:12Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.