Multimodal LLM for Intelligent Transportation Systems
- URL: http://arxiv.org/abs/2412.11683v1
- Date: Mon, 16 Dec 2024 11:50:30 GMT
- Title: Multimodal LLM for Intelligent Transportation Systems
- Authors: Dexter Le, Aybars Yunusoglu, Karn Tiwari, Murat Isik, I. Can Dikmen,
- Abstract summary: This paper introduces a novel 3-dimensional framework that encapsulates the intersection of applications, machine learning methodologies, and hardware devices.
Instead of using multiple machine learning algorithms, our framework uses a single, data-centric LLM architecture that can analyze time series, images, and videos.
We apply this LLM framework to different sensor datasets, including time-series data and visual data from sources like Oxford Radar RobotCar, D-Behavior (D-Set), nuScenes by Motional, and Comma2k19.
- Score: 0.0
- License:
- Abstract: In the evolving landscape of transportation systems, integrating Large Language Models (LLMs) offers a promising frontier for advancing intelligent decision-making across various applications. This paper introduces a novel 3-dimensional framework that encapsulates the intersection of applications, machine learning methodologies, and hardware devices, particularly emphasizing the role of LLMs. Instead of using multiple machine learning algorithms, our framework uses a single, data-centric LLM architecture that can analyze time series, images, and videos. We explore how LLMs can enhance data interpretation and decision-making in transportation. We apply this LLM framework to different sensor datasets, including time-series data and visual data from sources like Oxford Radar RobotCar, D-Behavior (D-Set), nuScenes by Motional, and Comma2k19. The goal is to streamline data processing workflows, reduce the complexity of deploying multiple models, and make intelligent transportation systems more efficient and accurate. The study was conducted using state-of-the-art hardware, leveraging the computational power of AMD RTX 3060 GPUs and Intel i9-12900 processors. The experimental results demonstrate that our framework achieves an average accuracy of 91.33\% across these datasets, with the highest accuracy observed in time-series data (92.7\%), showcasing the model's proficiency in handling sequential information essential for tasks such as motion planning and predictive maintenance. Through our exploration, we demonstrate the versatility and efficacy of LLMs in handling multimodal data within the transportation sector, ultimately providing insights into their application in real-world scenarios. Our findings align with the broader conference themes, highlighting the transformative potential of LLMs in advancing transportation technologies.
Related papers
- Scaling Autonomous Agents via Automatic Reward Modeling And Planning [52.39395405893965]
Large language models (LLMs) have demonstrated remarkable capabilities across a range of tasks.
However, they still struggle with problems requiring multi-step decision-making and environmental feedback.
We propose a framework that can automatically learn a reward model from the environment without human annotations.
arXiv Detail & Related papers (2025-02-17T18:49:25Z) - JEMA: A Joint Embedding Framework for Scalable Co-Learning with Multimodal Alignment [0.0]
JEMA (Joint Embedding with Multimodal Alignment) is a novel co-learning framework tailored for laser metal deposition (LMD)
We report an 8% increase in performance in multimodal settings and a 1% improvement in unimodal settings compared to supervised contrastive learning.
Our framework lays the foundation for integrating multisensor data with metadata, enabling diverse downstream tasks within the LMD domain and beyond.
arXiv Detail & Related papers (2024-10-31T14:42:26Z) - Efficient Driving Behavior Narration and Reasoning on Edge Device Using Large Language Models [16.532357621144342]
Large language models (LLMs) can describe driving scenes and behaviors with a level of accuracy similar to human perception.
We propose a driving behavior narration and reasoning framework that applies LLMs to edge devices.
Our experiments show that LLMs deployed on edge devices can achieve satisfactory response speeds.
arXiv Detail & Related papers (2024-09-30T15:03:55Z) - NVLM: Open Frontier-Class Multimodal LLMs [64.00053046838225]
We introduce NVLM 1.0, a family of frontier-class multimodal large language models (LLMs) that achieve state-of-the-art results on vision-language tasks.
We propose a novel architecture that enhances both training efficiency and multimodal reasoning capabilities.
We develop production-grade multimodality for the NVLM-1.0 models, enabling them to excel in vision-language tasks.
arXiv Detail & Related papers (2024-09-17T17:59:06Z) - Multi-Modal Data-Efficient 3D Scene Understanding for Autonomous Driving [58.16024314532443]
We introduce LaserMix++, a framework that integrates laser beam manipulations from disparate LiDAR scans and incorporates LiDAR-camera correspondences to assist data-efficient learning.
Results demonstrate that LaserMix++ outperforms fully supervised alternatives, achieving comparable accuracy with five times fewer annotations.
This substantial advancement underscores the potential of semi-supervised approaches in reducing the reliance on extensive labeled data in LiDAR-based 3D scene understanding systems.
arXiv Detail & Related papers (2024-05-08T17:59:53Z) - Are You Being Tracked? Discover the Power of Zero-Shot Trajectory
Tracing with LLMs! [3.844253028598048]
This study introduces LLMTrack, a model that illustrates how LLMs can be leveraged for Zero-Shot Trajectory Recognition.
We evaluate the model using real-world datasets designed to challenge it with distinct trajectories characterized by indoor and outdoor scenarios.
arXiv Detail & Related papers (2024-03-10T12:50:35Z) - LLM Inference Unveiled: Survey and Roofline Model Insights [62.92811060490876]
Large Language Model (LLM) inference is rapidly evolving, presenting a unique blend of opportunities and challenges.
Our survey stands out from traditional literature reviews by not only summarizing the current state of research but also by introducing a framework based on roofline model.
This framework identifies the bottlenecks when deploying LLMs on hardware devices and provides a clear understanding of practical problems.
arXiv Detail & Related papers (2024-02-26T07:33:05Z) - Exploring the Frontier of Vision-Language Models: A Survey of Current Methodologies and Future Directions [11.786387517781328]
Vision-Language Models (VLMs) are advanced models that can tackle more intricate tasks such as image captioning and visual question answering.
Our classification organizes VLMs into three distinct categories: models dedicated to vision-language understanding, models that process multimodal inputs to generate unimodal (textual) outputs and models that both accept and produce multimodal inputs and outputs.
We meticulously dissect each model, offering an extensive analysis of its foundational architecture, training data sources, as well as its strengths and limitations wherever possible.
arXiv Detail & Related papers (2024-02-20T18:57:34Z) - Federated Fine-Tuning of LLMs on the Very Edge: The Good, the Bad, the Ugly [62.473245910234304]
This paper takes a hardware-centric approach to explore how Large Language Models can be brought to modern edge computing systems.
We provide a micro-level hardware benchmark, compare the model FLOP utilization to a state-of-the-art data center GPU, and study the network utilization in realistic conditions.
arXiv Detail & Related papers (2023-10-04T20:27:20Z) - LAMM: Language-Assisted Multi-Modal Instruction-Tuning Dataset,
Framework, and Benchmark [81.42376626294812]
We present Language-Assisted Multi-Modal instruction tuning dataset, framework, and benchmark.
Our aim is to establish LAMM as a growing ecosystem for training and evaluating MLLMs.
We present a comprehensive dataset and benchmark, which cover a wide range of vision tasks for 2D and 3D vision.
arXiv Detail & Related papers (2023-06-11T14:01:17Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.