TransportationGames: Benchmarking Transportation Knowledge of
(Multimodal) Large Language Models
- URL: http://arxiv.org/abs/2401.04471v1
- Date: Tue, 9 Jan 2024 10:20:29 GMT
- Title: TransportationGames: Benchmarking Transportation Knowledge of
(Multimodal) Large Language Models
- Authors: Xue Zhang, Xiangyu Shi, Xinyue Lou, Rui Qi, Yufeng Chen, Jinan Xu,
Wenjuan Han
- Abstract summary: TransportationGames is an evaluation benchmark for assessing (M)LLMs in the transportation domain.
We test the performance of various (M)LLMs in memorizing, understanding, and applying transportation knowledge by the selected tasks.
- Score: 46.862519898969325
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Large language models (LLMs) and multimodal large language models (MLLMs)
have shown excellent general capabilities, even exhibiting adaptability in many
professional domains such as law, economics, transportation, and medicine.
Currently, many domain-specific benchmarks have been proposed to verify the
performance of (M)LLMs in specific fields. Among various domains,
transportation plays a crucial role in modern society as it impacts the
economy, the environment, and the quality of life for billions of people.
However, it is unclear how much traffic knowledge (M)LLMs possess and whether
they can reliably perform transportation-related tasks. To address this gap, we
propose TransportationGames, a carefully designed and thorough evaluation
benchmark for assessing (M)LLMs in the transportation domain. By
comprehensively considering the applications in real-world scenarios and
referring to the first three levels in Bloom's Taxonomy, we test the
performance of various (M)LLMs in memorizing, understanding, and applying
transportation knowledge by the selected tasks. The experimental results show
that although some models perform well in some tasks, there is still much room
for improvement overall. We hope the release of TransportationGames can serve
as a foundation for future research, thereby accelerating the implementation
and application of (M)LLMs in the transportation domain.
Related papers
- Needle In A Multimodal Haystack [79.81804334634408]
We present the first benchmark specifically designed to evaluate the capability of existing MLLMs to comprehend long multimodal documents.
Our benchmark includes three types of evaluation tasks: multimodal retrieval, counting, and reasoning.
We observe that existing models still have significant room for improvement on these tasks, especially on vision-centric evaluation.
arXiv Detail & Related papers (2024-06-11T13:09:16Z) - Efficient Multimodal Large Language Models: A Survey [60.7614299984182]
Multimodal Large Language Models (MLLMs) have demonstrated remarkable performance in tasks such as visual question answering, visual understanding and reasoning.
The extensive model size and high training and inference costs have hindered the widespread application of MLLMs in academia and industry.
This survey provides a comprehensive and systematic review of the current state of efficient MLLMs.
arXiv Detail & Related papers (2024-05-17T12:37:10Z) - Probing Multimodal LLMs as World Models for Driving [72.18727651074563]
This study focuses on the application of Multimodal Large Language Models (MLLMs) within the domain of autonomous driving.
We evaluate the capability of various MLLMs as world models for driving from the perspective of a fixed in-car camera.
Our results highlight a critical gap in the current capabilities of state-of-the-art MLLMs.
arXiv Detail & Related papers (2024-05-09T17:52:42Z) - Large Language Models for Mobility in Transportation Systems: A Survey on Forecasting Tasks [8.548422411704218]
Machine learning and deep learning methods are favored for their flexibility and accuracy.
With the advent of large language models (LLMs), many researchers have combined these models with previous techniques or applied LLMs to directly predict future traffic information and human travel behaviors.
arXiv Detail & Related papers (2024-05-03T02:54:43Z) - MMT-Bench: A Comprehensive Multimodal Benchmark for Evaluating Large Vision-Language Models Towards Multitask AGI [71.53579367538725]
MMT-Bench is a benchmark designed to assess Large Vision-Language Models (LVLMs) across massive multimodal tasks.
MMT-Bench comprises $31,325$ meticulously curated multi-choice visual questions from various multimodal scenarios.
arXiv Detail & Related papers (2024-04-24T17:37:05Z) - A Survey on Multimodal Large Language Models for Autonomous Driving [31.614730391949657]
Multimodal AI systems benefiting from large models have the potential to equally perceive the real world, make decisions, and control tools as humans.
Despite its immense potential, there is still a lack of a comprehensive understanding of key challenges, opportunities, and future endeavors to apply in Multimodal Large Language Models driving systems.
arXiv Detail & Related papers (2023-11-21T03:32:01Z) - MME: A Comprehensive Evaluation Benchmark for Multimodal Large Language Models [73.86954509967416]
Multimodal Large Language Model (MLLM) relies on the powerful LLM to perform multimodal tasks.
This paper presents the first comprehensive MLLM Evaluation benchmark MME.
It measures both perception and cognition abilities on a total of 14 subtasks.
arXiv Detail & Related papers (2023-06-23T09:22:36Z) - Selective Survey: Most Efficient Models and Solvers for Integrative
Multimodal Transport [0.0]
The main objective is to explore a beneficent selection of the existing research, methods and information in the field of multimodal transportation research.
The selective survey covers multimodal transport design and optimization in terms of: cost, time, and network topology.
The gap between theory and real-world applications should be further solved in order to optimize the global multimodal transportation system.
arXiv Detail & Related papers (2021-03-16T08:31:44Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.