Related papers: Leveraging Large Language Models with Chain-of-Thought and Prompt Engineering for Traffic Crash Severity Analysis and Inference

Leveraging Large Language Models with Chain-of-Thought and Prompt Engineering for Traffic Crash Severity Analysis and Inference

URL: http://arxiv.org/abs/2408.04652v1
Date: Sun, 4 Aug 2024 17:14:10 GMT
Title: Leveraging Large Language Models with Chain-of-Thought and Prompt Engineering for Traffic Crash Severity Analysis and Inference
Authors: Hao Zhen, Yucheng Shi, Yongcan Huang, Jidong J. Yang, Ninghao Liu,
Abstract summary: This study explores the use of three state-of-the-art Large Language Models (LLMs) for crash severity inference. We generate textual narratives from original traffic crash data using a pre-built template infused with domain knowledge. We incorporated Chain-of-Thought (CoT) reasoning to guide the LLMs in analyzing the crash causes and then inferring the severity.
Score: 24.565253576049024
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Harnessing the power of Large Language Models (LLMs), this study explores the use of three state-of-the-art LLMs, specifically GPT-3.5-turbo, LLaMA3-8B, and LLaMA3-70B, for crash severity inference, framing it as a classification task. We generate textual narratives from original traffic crash tabular data using a pre-built template infused with domain knowledge. Additionally, we incorporated Chain-of-Thought (CoT) reasoning to guide the LLMs in analyzing the crash causes and then inferring the severity. This study also examine the impact of prompt engineering specifically designed for crash severity inference. The LLMs were tasked with crash severity inference to: (1) evaluate the models' capabilities in crash severity analysis, (2) assess the effectiveness of CoT and domain-informed prompt engineering, and (3) examine the reasoning abilities with the CoT framework. Our results showed that LLaMA3-70B consistently outperformed the other models, particularly in zero-shot settings. The CoT and Prompt Engineering techniques significantly enhanced performance, improving logical reasoning and addressing alignment issues. Notably, the CoT offers valuable insights into LLMs' reasoning processes, unleashing their capacity to consider diverse factors such as environmental conditions, driver behavior, and vehicle characteristics in severity analysis and inference.

Related papers

Zero-Shot Event Causality Identification via Multi-source Evidence Fuzzy Aggregation with Large Language Models [11.541829239773643]
Event Causality Identification (ECI) aims to detect causal relationships between events in textual contexts.<n>Existing ECI models predominantly rely on supervised methodologies, suffering from dependence on large-scale annotated data.<n>We propose MEFA, a novel zero-shot framework based on Multi-source Evidence Fuzzy Aggregation.
arXiv Detail & Related papers (2025-06-06T01:56:05Z)
Advanced Crash Causation Analysis for Freeway Safety: A Large Language Model Approach to Identifying Key Contributing Factors [0.0]
This research leverages large language model (LLM) to analyze freeway crash data and provide crash causation analysis accordingly.<n>The fine-tuned Llama3 8B model was then used to identify crash causation without pre-labeled data through zero-shot classification.<n>Results demonstrate that LLMs effectively identify primary crash causes such as alcohol-impaired driving, speeding, aggressive driving, and driver inattention.
arXiv Detail & Related papers (2025-05-15T04:07:55Z)
CrashSage: A Large Language Model-Centered Framework for Contextual and Interpretable Traffic Crash Analysis [0.46040036610482665]
Road crashes claim over 1.3 million lives annually worldwide and incur global economic losses exceeding $1.8 trillion.<n>This study presents CrashSage, a novel Large Language Model (LLM)-centered framework designed to advance crash analysis and modeling through four key innovations.
arXiv Detail & Related papers (2025-05-08T00:23:18Z)
CODECRASH: Stress Testing LLM Reasoning under Structural and Semantic Perturbations [36.60702578561009]
We present CodeCrash, a unified benchmark that evaluates robustness under code structural and textual distraction perturbations. We evaluate seventeen Large Language Models (LLMs) using direct and Chain-of-Thought inference. Our findings reveal the fragility of LLMs under structural noise and the inherent reliance on natural language cues.
arXiv Detail & Related papers (2025-04-19T00:40:28Z)
CoT-RAG: Integrating Chain of Thought and Retrieval-Augmented Generation to Enhance Reasoning in Large Language Models [14.784841713647682]
CoT-RAG is a novel reasoning framework with three key designs. It features Knowledge Graph-driven CoT Generation, Learnable Knowledge Case-aware RAG, and Pseudo-Program Prompting Execution. Compared with the-state-of-the-art methods, CoT-RAG exhibits a significant accuracy improvement, ranging from 4.0% to 23.0%.
arXiv Detail & Related papers (2025-04-18T07:55:09Z)
Integrating Chain-of-Thought for Multimodal Alignment: A Study on 3D Vision-Language Learning [20.562109430526007]
Chain-of-Thought (CoT) reasoning has proven effective in natural language tasks but remains underexplored in multimodal alignment. This study investigates its integration into 3D vision-hugging learning by embedding structured reasoning into alignment training.
arXiv Detail & Related papers (2025-03-08T14:24:54Z)
What Makes In-context Learning Effective for Mathematical Reasoning: A Theoretical Analysis [81.15503859645149]
In this paper, we aim to theoretically analyze the impact of in-context demonstrations on large language models' reasoning performance. We propose a straightforward, generalizable, and low-complexity demonstration selection method named LMS3.
arXiv Detail & Related papers (2024-12-11T11:38:11Z)
On the Impact of Fine-Tuning on Chain-of-Thought Reasoning [26.11408084129897]
This study investigates the effect of fine-tuning on the reasoning abilities of large language models. It addresses questions regarding the impact of task-specific fine-tuning on overall reasoning capabilities, the influence of fine-tuning on Chain-of-Thought (CoT) reasoning performance, and the implications for the faithfulness of CoT reasonings.
arXiv Detail & Related papers (2024-11-22T23:54:37Z)
Failure Modes of LLMs for Causal Reasoning on Narratives [51.19592551510628]
We investigate the interaction between world knowledge and logical reasoning.<n>We find that state-of-the-art large language models (LLMs) often rely on superficial generalizations.<n>We show that simple reformulations of the task can elicit more robust reasoning behavior.
arXiv Detail & Related papers (2024-10-31T12:48:58Z)
A Looming Replication Crisis in Evaluating Behavior in Language Models? Evidence and Solutions [15.350973327319418]
Large language models (LLMs) are increasingly integrated into a wide range of everyday applications. This raises concerns about the replicability and generalizability of insights gained from research on LLM behavior. We tested GPT-3.5, GPT-4o, Gemini 1.5 Pro, Claude 3 Opus, Llama 3-8B, and Llama 3-70B, on the chain-of-thought, EmotionPrompting, ExpertPrompting, Sandbagging, as well as Re-Reading prompt engineering techniques.
arXiv Detail & Related papers (2024-09-30T14:00:34Z)
Making Large Language Models Better Planners with Reasoning-Decision Alignment [70.5381163219608]
We motivate an end-to-end decision-making model based on multimodality-augmented LLM. We propose a reasoning-decision alignment constraint between the paired CoTs and planning results. We dub our proposed large language planners with reasoning-decision alignment as RDA-Driver.
arXiv Detail & Related papers (2024-08-25T16:43:47Z)
Steamroller Problems: An Evaluation of LLM Reasoning Capability with Automated Theorem Prover Strategies [0.18416014644193066]
We evaluate the performance of GPT4, GPT3.5 Turbo and Google's recent Gemini model on problems from a steamroller domain. We found that the models' performance when using the ATP reasoning strategies was comparable to one-shot chain of thought.
arXiv Detail & Related papers (2024-07-17T22:49:23Z)
AutoDetect: Towards a Unified Framework for Automated Weakness Detection in Large Language Models [95.09157454599605]
Large Language Models (LLMs) are becoming increasingly powerful, but they still exhibit significant but subtle weaknesses. Traditional benchmarking approaches cannot thoroughly pinpoint specific model deficiencies. We introduce a unified framework, AutoDetect, to automatically expose weaknesses in LLMs across various tasks.
arXiv Detail & Related papers (2024-06-24T15:16:45Z)
RUPBench: Benchmarking Reasoning Under Perturbations for Robustness Evaluation in Large Language Models [12.112914393948415]
We present RUPBench, a benchmark designed to evaluate large language models (LLMs) across diverse reasoning tasks. Our benchmark incorporates 15 reasoning datasets, categorized into commonsense, arithmetic, logical, and knowledge-intensive reasoning. By examining the performance of state-of-the-art LLMs such as GPT-4o, Llama3, Phi-3, and Gemma on both original and perturbed datasets, we provide a detailed analysis of their robustness and error patterns.
arXiv Detail & Related papers (2024-06-16T17:26:44Z)
Learning Traffic Crashes as Language: Datasets, Benchmarks, and What-if Causal Analyses [76.59021017301127]
We propose a large-scale traffic crash language dataset, named CrashEvent, summarizing 19,340 real-world crash reports. We further formulate the crash event feature learning as a novel text reasoning problem and further fine-tune various large language models (LLMs) to predict detailed accident outcomes. Our experiments results show that our LLM-based approach not only predicts the severity of accidents but also classifies different types of accidents and predicts injury outcomes.
arXiv Detail & Related papers (2024-06-16T03:10:16Z)
CausalBench: A Comprehensive Benchmark for Causal Learning Capability of LLMs [27.362012903540492]
The ability to understand causality significantly impacts the competence of large language models (LLMs) in output explanation and counterfactual reasoning. The ability to understand causality significantly impacts the competence of large language models (LLMs) in output explanation and counterfactual reasoning.
arXiv Detail & Related papers (2024-04-09T14:40:08Z)
Improving Open Information Extraction with Large Language Models: A Study on Demonstration Uncertainty [52.72790059506241]
Open Information Extraction (OIE) task aims at extracting structured facts from unstructured text. Despite the potential of large language models (LLMs) like ChatGPT as a general task solver, they lag behind state-of-the-art (supervised) methods in OIE tasks.
arXiv Detail & Related papers (2023-09-07T01:35:24Z)
Ladder-of-Thought: Using Knowledge as Steps to Elevate Stance Detection [73.31406286956535]
We introduce the Ladder-of-Thought (LoT) for the stance detection task. LoT directs the small LMs to assimilate high-quality external knowledge, refining the intermediate rationales produced. Our empirical evaluations underscore LoT's efficacy, marking a 16% improvement over GPT-3.5 and a 10% enhancement compared to GPT-3.5 with CoT on stance detection task.
arXiv Detail & Related papers (2023-08-31T14:31:48Z)
Exploring Self-supervised Logic-enhanced Training for Large Language Models [59.227222647741094]
In this paper, we make the first attempt to investigate the feasibility of incorporating logical knowledge through self-supervised post-training. We devise an auto-regressive objective variant of MERIt and integrate it with two LLM series, i.e., FLAN-T5 and LLaMA, with parameter size ranging from 3 billion to 13 billion. The results on two challenging logical reasoning benchmarks demonstrate the effectiveness of LogicLLM.
arXiv Detail & Related papers (2023-05-23T06:13:10Z)

This list is automatically generated from the titles and abstracts of the papers in this site.