TransportationGames: Benchmarking Transportation Knowledge of
(Multimodal) Large Language Models
- URL: http://arxiv.org/abs/2401.04471v1
- Date: Tue, 9 Jan 2024 10:20:29 GMT
- Title: TransportationGames: Benchmarking Transportation Knowledge of
(Multimodal) Large Language Models
- Authors: Xue Zhang, Xiangyu Shi, Xinyue Lou, Rui Qi, Yufeng Chen, Jinan Xu,
Wenjuan Han
- Abstract summary: TransportationGames is an evaluation benchmark for assessing (M)LLMs in the transportation domain.
We test the performance of various (M)LLMs in memorizing, understanding, and applying transportation knowledge by the selected tasks.
- Score: 46.862519898969325
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Large language models (LLMs) and multimodal large language models (MLLMs)
have shown excellent general capabilities, even exhibiting adaptability in many
professional domains such as law, economics, transportation, and medicine.
Currently, many domain-specific benchmarks have been proposed to verify the
performance of (M)LLMs in specific fields. Among various domains,
transportation plays a crucial role in modern society as it impacts the
economy, the environment, and the quality of life for billions of people.
However, it is unclear how much traffic knowledge (M)LLMs possess and whether
they can reliably perform transportation-related tasks. To address this gap, we
propose TransportationGames, a carefully designed and thorough evaluation
benchmark for assessing (M)LLMs in the transportation domain. By
comprehensively considering the applications in real-world scenarios and
referring to the first three levels in Bloom's Taxonomy, we test the
performance of various (M)LLMs in memorizing, understanding, and applying
transportation knowledge by the selected tasks. The experimental results show
that although some models perform well in some tasks, there is still much room
for improvement overall. We hope the release of TransportationGames can serve
as a foundation for future research, thereby accelerating the implementation
and application of (M)LLMs in the transportation domain.
Related papers
- Advancing Object Detection in Transportation with Multimodal Large Language Models (MLLMs): A Comprehensive Review and Empirical Testing [4.79071544824946]
This study aims to comprehensively review and empirically evaluate the application of multimodal large language models (MLLMs) and Large Vision Models (VLMs) in object detection for transportation systems.
arXiv Detail & Related papers (2024-09-26T20:58:11Z) - A Survey on Benchmarks of Multimodal Large Language Models [65.87641718350639]
This paper presents a comprehensive review of 200 benchmarks and evaluations for Multimodal Large Language Models (MLLMs)
We focus on (1)perception and understanding, (2)cognition and reasoning, (3)specific domains, (4)key capabilities, and (5)other modalities.
Our key argument is that evaluation should be regarded as a crucial discipline to support the development of MLLMs better.
arXiv Detail & Related papers (2024-08-16T09:52:02Z) - A Comprehensive Review of Multimodal Large Language Models: Performance and Challenges Across Different Tasks [74.52259252807191]
Multimodal Large Language Models (MLLMs) address the complexities of real-world applications far beyond the capabilities of single-modality systems.
This paper systematically sorts out the applications of MLLM in multimodal tasks such as natural language, vision, and audio.
arXiv Detail & Related papers (2024-08-02T15:14:53Z) - Needle In A Multimodal Haystack [79.81804334634408]
We present the first benchmark specifically designed to evaluate the capability of existing MLLMs to comprehend long multimodal documents.
Our benchmark includes three types of evaluation tasks: multimodal retrieval, counting, and reasoning.
We observe that existing models still have significant room for improvement on these tasks, especially on vision-centric evaluation.
arXiv Detail & Related papers (2024-06-11T13:09:16Z) - Efficient Multimodal Large Language Models: A Survey [60.7614299984182]
Multimodal Large Language Models (MLLMs) have demonstrated remarkable performance in tasks such as visual question answering, visual understanding and reasoning.
The extensive model size and high training and inference costs have hindered the widespread application of MLLMs in academia and industry.
This survey provides a comprehensive and systematic review of the current state of efficient MLLMs.
arXiv Detail & Related papers (2024-05-17T12:37:10Z) - Large Language Models for Mobility in Transportation Systems: A Survey on Forecasting Tasks [8.548422411704218]
Machine learning and deep learning methods are favored for their flexibility and accuracy.
With the advent of large language models (LLMs), many researchers have combined these models with previous techniques or applied LLMs to directly predict future traffic information and human travel behaviors.
arXiv Detail & Related papers (2024-05-03T02:54:43Z) - MMT-Bench: A Comprehensive Multimodal Benchmark for Evaluating Large Vision-Language Models Towards Multitask AGI [71.53579367538725]
MMT-Bench is a benchmark designed to assess Large Vision-Language Models (LVLMs) across massive multimodal tasks.
MMT-Bench comprises $31,325$ meticulously curated multi-choice visual questions from various multimodal scenarios.
arXiv Detail & Related papers (2024-04-24T17:37:05Z) - A Survey on Multimodal Large Language Models for Autonomous Driving [31.614730391949657]
Multimodal AI systems benefiting from large models have the potential to equally perceive the real world, make decisions, and control tools as humans.
Despite its immense potential, there is still a lack of a comprehensive understanding of key challenges, opportunities, and future endeavors to apply in Multimodal Large Language Models driving systems.
arXiv Detail & Related papers (2023-11-21T03:32:01Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.