Benchmarking the Capabilities of Large Language Models in Transportation System Engineering: Accuracy, Consistency, and Reasoning Behaviors
- URL: http://arxiv.org/abs/2408.08302v1
- Date: Thu, 15 Aug 2024 17:55:45 GMT
- Title: Benchmarking the Capabilities of Large Language Models in Transportation System Engineering: Accuracy, Consistency, and Reasoning Behaviors
- Authors: Usman Syed, Ethan Light, Xingang Guo, Huan Zhang, Lianhui Qin, Yanfeng Ouyang, Bin Hu,
- Abstract summary: We introduce TransportBench, a benchmark dataset that includes a sample of transportation engineering problems on a wide range of subjects.
This dataset is used by human experts to evaluate the capabilities of various commercial and open-sourced large language models (LLMs)
Our study marks a thrilling first step toward harnessing artificial general intelligence for complex transportation challenges.
- Score: 17.20186037322538
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: In this paper, we explore the capabilities of state-of-the-art large language models (LLMs) such as GPT-4, GPT-4o, Claude 3.5 Sonnet, Claude 3 Opus, Gemini 1.5 Pro, Llama 3, and Llama 3.1 in solving some selected undergraduate-level transportation engineering problems. We introduce TransportBench, a benchmark dataset that includes a sample of transportation engineering problems on a wide range of subjects in the context of planning, design, management, and control of transportation systems. This dataset is used by human experts to evaluate the capabilities of various commercial and open-sourced LLMs, especially their accuracy, consistency, and reasoning behaviors, in solving transportation engineering problems. Our comprehensive analysis uncovers the unique strengths and limitations of each LLM, e.g. our analysis shows the impressive accuracy and some unexpected inconsistent behaviors of Claude 3.5 Sonnet in solving TransportBench problems. Our study marks a thrilling first step toward harnessing artificial general intelligence for complex transportation challenges.
Related papers
- Beyond Words: Evaluating Large Language Models in Transportation Planning [0.0]
This study investigates the evaluation of Large Language Models (LLMs), specifically GPT-4 and Phi-3-mini, to enhance transportation planning.
The findings underscore the transformative potential of GenAI technologies in urban transportation planning.
arXiv Detail & Related papers (2024-09-22T16:20:00Z) - Leveraging Large Language Models with Chain-of-Thought and Prompt Engineering for Traffic Crash Severity Analysis and Inference [24.565253576049024]
This study explores the use of three state-of-the-art Large Language Models (LLMs) for crash severity inference.
We generate textual narratives from original traffic crash data using a pre-built template infused with domain knowledge.
We incorporated Chain-of-Thought (CoT) reasoning to guide the LLMs in analyzing the crash causes and then inferring the severity.
arXiv Detail & Related papers (2024-08-04T17:14:10Z) - Large Language Models for Mobility in Transportation Systems: A Survey on Forecasting Tasks [8.548422411704218]
Machine learning and deep learning methods are favored for their flexibility and accuracy.
With the advent of large language models (LLMs), many researchers have combined these models with previous techniques or applied LLMs to directly predict future traffic information and human travel behaviors.
arXiv Detail & Related papers (2024-05-03T02:54:43Z) - Capabilities of Large Language Models in Control Engineering: A Benchmark Study on GPT-4, Claude 3 Opus, and Gemini 1.0 Ultra [7.487691551328453]
GPT-4, Claude 3 Opus, and Gemini 1.0 Ultra are investigated for solving undergraduate-level control problems.
We present evaluations conducted by a panel of human experts.
Our study serves as an initial step towards the broader goal of employing artificial general intelligence in control engineering.
arXiv Detail & Related papers (2024-04-04T17:58:38Z) - CogCoM: Train Large Vision-Language Models Diving into Details through Chain of Manipulations [61.21923643289266]
Chain of Manipulations is a mechanism that enables Vision-Language Models to solve problems step-by-step with evidence.
After training, models can solve various visual problems by eliciting intrinsic manipulations (e.g., grounding, zoom in) actively without involving external tools.
Our trained model, textbfCogCoM, achieves state-of-the-art performance across 9 benchmarks from 4 categories.
arXiv Detail & Related papers (2024-02-06T18:43:48Z) - TAT-LLM: A Specialized Language Model for Discrete Reasoning over Tabular and Textual Data [73.29220562541204]
We consider harnessing the amazing power of language models (LLMs) to solve our task.
We develop a TAT-LLM language model by fine-tuning LLaMA 2 with the training data generated automatically from existing expert-annotated datasets.
arXiv Detail & Related papers (2024-01-24T04:28:50Z) - TransportationGames: Benchmarking Transportation Knowledge of
(Multimodal) Large Language Models [46.862519898969325]
TransportationGames is an evaluation benchmark for assessing (M)LLMs in the transportation domain.
We test the performance of various (M)LLMs in memorizing, understanding, and applying transportation knowledge by the selected tasks.
arXiv Detail & Related papers (2024-01-09T10:20:29Z) - Competition-Level Problems are Effective LLM Evaluators [121.15880285283116]
This paper aims to evaluate the reasoning capacities of large language models (LLMs) in solving recent programming problems in Codeforces.
We first provide a comprehensive evaluation of GPT-4's peiceived zero-shot performance on this task, considering various aspects such as problems' release time, difficulties, and types of errors encountered.
Surprisingly, theThoughtived performance of GPT-4 has experienced a cliff like decline in problems after September 2021 consistently across all the difficulties and types of problems.
arXiv Detail & Related papers (2023-12-04T18:58:57Z) - A Study of Situational Reasoning for Traffic Understanding [63.45021731775964]
We devise three novel text-based tasks for situational reasoning in the traffic domain.
We adopt four knowledge-enhanced methods that have shown generalization capability across language reasoning tasks in prior work.
We provide in-depth analyses of model performance on data partitions and examine model predictions categorically.
arXiv Detail & Related papers (2023-06-05T01:01:12Z) - A Bibliometric Analysis and Review on Reinforcement Learning for
Transportation Applications [43.356096302298056]
Transportation is the backbone of the economy and urban development.
Reinforcement Learning (RL) that enables autonomous decision-makers to interact with the complex environment.
This paper conducts a bibliometric analysis to identify the development of RL-based methods for transportation applications.
arXiv Detail & Related papers (2022-10-26T07:34:51Z) - The 5th AI City Challenge [51.83023045451549]
The fifth AI City Challenge attracted 305 participating teams across 38 countries.
The evaluation was conducted on both algorithmic effectiveness and computational efficiency.
Results show the promise of AI in Smarter Transportation.
arXiv Detail & Related papers (2021-04-25T19:15:27Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.