TransGPT: Multi-modal Generative Pre-trained Transformer for
Transportation
- URL: http://arxiv.org/abs/2402.07233v1
- Date: Sun, 11 Feb 2024 15:50:35 GMT
- Title: TransGPT: Multi-modal Generative Pre-trained Transformer for
Transportation
- Authors: Peng Wang, Xiang Wei, Fangxu Hu and Wenjuan Han
- Abstract summary: This paper presents TransGPT, a novel (multi-modal) large language model for the transportation domain.
It consists of two independent variants: TransGPT-SM for single-modal data and TransGPT-MM for multi-modal data.
This work advances the state-of-the-art of NLP in the transportation domain and provides a useful tool for ITS researchers and practitioners.
- Score: 19.184173455587263
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Natural language processing (NLP) is a key component of intelligent
transportation systems (ITS), but it faces many challenges in the
transportation domain, such as domain-specific knowledge and data, and
multi-modal inputs and outputs. This paper presents TransGPT, a novel
(multi-modal) large language model for the transportation domain, which
consists of two independent variants: TransGPT-SM for single-modal data and
TransGPT-MM for multi-modal data. TransGPT-SM is finetuned on a single-modal
Transportation dataset (STD) that contains textual data from various sources in
the transportation domain. TransGPT-MM is finetuned on a multi-modal
Transportation dataset (MTD) that we manually collected from three areas of the
transportation domain: driving tests, traffic signs, and landmarks. We evaluate
TransGPT on several benchmark datasets for different tasks in the
transportation domain, and show that it outperforms baseline models on most
tasks. We also showcase the potential applications of TransGPT for traffic
analysis and modeling, such as generating synthetic traffic scenarios,
explaining traffic phenomena, answering traffic-related questions, providing
traffic recommendations, and generating traffic reports. This work advances the
state-of-the-art of NLP in the transportation domain and provides a useful tool
for ITS researchers and practitioners.
Related papers
- Think Twice Before Recognizing: Large Multimodal Models for General Fine-grained Traffic Sign Recognition [49.20086587208214]
We propose a new strategy called think twice before recognizing to improve fine-grained traffic sign recognition (TSR)
Our strategy achieves effective fine-grained TSR by stimulating the multiple-thinking capability of large multimodal models (LMM)
arXiv Detail & Related papers (2024-09-03T02:08:47Z) - TrafficGPT: Towards Multi-Scale Traffic Analysis and Generation with Spatial-Temporal Agent Framework [3.947797359736224]
We have designed a multi-scale traffic generation system, TrafficGPT, using three AI agents to process multi-scale traffic data.
TrafficGPT consists of three essential AI agents: 1) a text-to-demand agent to interact with users and extract prediction tasks through texts; 2) a traffic prediction agent that leverages multi-scale traffic data to generate temporal features and similarity; and 3) a suggestion and visualization agent that uses the prediction results to generate suggestions and visualizations.
arXiv Detail & Related papers (2024-05-08T07:48:40Z) - xMTrans: Temporal Attentive Cross-Modality Fusion Transformer for Long-Term Traffic Prediction [3.08580339590996]
We introduce a novel temporal attentive cross-modality transformer model for long-term traffic predictions, namely xMTrans.
We conduct experiments to evaluate our proposed model on traffic congestion and taxi demand predictions using real-world datasets.
arXiv Detail & Related papers (2024-05-08T06:29:26Z) - BjTT: A Large-scale Multimodal Dataset for Traffic Prediction [49.93028461584377]
Traditional traffic prediction methods rely on historical traffic data to predict traffic trends.
In this work, we explore how generative models combined with text describing the traffic system can be applied for traffic generation.
We propose ChatTraffic, the first diffusion model for text-to-traffic generation.
arXiv Detail & Related papers (2024-03-08T04:19:56Z) - TransportationGames: Benchmarking Transportation Knowledge of
(Multimodal) Large Language Models [46.862519898969325]
TransportationGames is an evaluation benchmark for assessing (M)LLMs in the transportation domain.
We test the performance of various (M)LLMs in memorizing, understanding, and applying transportation knowledge by the selected tasks.
arXiv Detail & Related papers (2024-01-09T10:20:29Z) - TrafficMOT: A Challenging Dataset for Multi-Object Tracking in Complex
Traffic Scenarios [23.831048188389026]
Multi-object tracking in traffic videos offers immense potential for enhancing traffic monitoring accuracy and promoting road safety measures.
Existing datasets for multi-object tracking in traffic videos often feature limited instances or focus on single classes.
We introduce TrafficMOT, an extensive dataset designed to encompass diverse traffic situations with complex scenarios.
arXiv Detail & Related papers (2023-11-30T18:59:56Z) - DriveGPT4: Interpretable End-to-end Autonomous Driving via Large Language Model [84.29836263441136]
This study introduces DriveGPT4, a novel interpretable end-to-end autonomous driving system based on multimodal large language models (MLLMs)
DriveGPT4 facilitates the interpretation of vehicle actions, offers pertinent reasoning, and effectively addresses a diverse range of questions posed by users.
Evaluations conducted on the BDD-X dataset showcase the superior qualitative and quantitative performance of DriveGPT4.
arXiv Detail & Related papers (2023-10-02T17:59:52Z) - Communication-Efficient Framework for Distributed Image Semantic
Wireless Transmission [68.69108124451263]
Federated learning-based semantic communication (FLSC) framework for multi-task distributed image transmission with IoT devices.
Each link is composed of a hierarchical vision transformer (HVT)-based extractor and a task-adaptive translator.
Channel state information-based multiple-input multiple-output transmission module designed to combat channel fading and noise.
arXiv Detail & Related papers (2023-08-07T16:32:14Z) - TrafficSafetyGPT: Tuning a Pre-trained Large Language Model to a
Domain-Specific Expert in Transportation Safety [2.1906688755530968]
Large Language Models (LLMs) have shown remarkable effectiveness in various general-domain natural language processing (NLP) tasks.
We introduce TrafficSafetyGPT, a novel LLAMA-based model, which has undergone supervised fine-tuning using TrafficSafety-2K dataset.
arXiv Detail & Related papers (2023-07-28T05:17:11Z) - Prompting for Multi-Modal Tracking [70.0522146292258]
We propose a novel multi-modal prompt tracker (ProTrack) for multi-modal tracking.
ProTrack can transfer the multi-modal inputs to a single modality by the prompt paradigm.
Our ProTrack can achieve high-performance multi-modal tracking by only altering the inputs, even without any extra training on multi-modal data.
arXiv Detail & Related papers (2022-07-29T09:35:02Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.