Related papers: Efficient Driving Behavior Narration and Reasoning on Edge Device Using Large Language Models

Efficient Driving Behavior Narration and Reasoning on Edge Device Using Large Language Models

URL: http://arxiv.org/abs/2409.20364v1
Date: Mon, 30 Sep 2024 15:03:55 GMT
Title: Efficient Driving Behavior Narration and Reasoning on Edge Device Using Large Language Models
Authors: Yizhou Huang, Yihua Cheng, Kezhi Wang,
Abstract summary: Large language models (LLMs) can describe driving scenes and behaviors with a level of accuracy similar to human perception. We propose a driving behavior narration and reasoning framework that applies LLMs to edge devices. Our experiments show that LLMs deployed on edge devices can achieve satisfactory response speeds.
Score: 16.532357621144342
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Deep learning architectures with powerful reasoning capabilities have driven significant advancements in autonomous driving technology. Large language models (LLMs) applied in this field can describe driving scenes and behaviors with a level of accuracy similar to human perception, particularly in visual tasks. Meanwhile, the rapid development of edge computing, with its advantage of proximity to data sources, has made edge devices increasingly important in autonomous driving. Edge devices process data locally, reducing transmission delays and bandwidth usage, and achieving faster response times. In this work, we propose a driving behavior narration and reasoning framework that applies LLMs to edge devices. The framework consists of multiple roadside units, with LLMs deployed on each unit. These roadside units collect road data and communicate via 5G NSR/NR networks. Our experiments show that LLMs deployed on edge devices can achieve satisfactory response speeds. Additionally, we propose a prompt strategy to enhance the narration and reasoning performance of the system. This strategy integrates multi-modal information, including environmental, agent, and motion data. Experiments conducted on the OpenDV-Youtube dataset demonstrate that our approach significantly improves performance across both tasks.

Related papers

Structured Labeling Enables Faster Vision-Language Models for End-to-End Autonomous Driving [29.019907345552475]
Vision-Language Models (VLMs) offer a promising approach to end-to-end autonomous driving due to their human-like reasoning capabilities.<n>Existing datasets with loosely formatted language descriptions are not machine-friendly and may introduce redundancy.<n>This paper introduces a structured and concise benchmark dataset, NuScenes-S, which is derived from the NuScenes dataset and contains machine-friendly structured representations.
arXiv Detail & Related papers (2025-06-05T12:59:35Z)
OmniDrive: A Holistic Vision-Language Dataset for Autonomous Driving with Counterfactual Reasoning [68.45848423501927]
We propose a holistic vision-language dataset that aligns agent models with 3D driving tasks through counterfactual reasoning. Our approach enhances decision-making by evaluating potential scenarios and their outcomes, similar to human drivers considering alternative actions.
arXiv Detail & Related papers (2025-04-06T03:54:21Z)
Tracking Meets Large Multimodal Models for Driving Scenario Understanding [76.71815464110153]
Large Multimodal Models (LMMs) have recently gained prominence in autonomous driving research. We propose to integrate tracking information as an additional input to recover 3D spatial and temporal details. We introduce a novel approach for embedding this tracking information into LMMs to enhance their understanding of driving scenarios.
arXiv Detail & Related papers (2025-03-18T17:59:12Z)
Learning to Drive by Imitating Surrounding Vehicles [0.6612847014373572]
Imitation learning is a promising approach for training autonomous vehicles to navigate complex traffic environments. We propose a data augmentation strategy that enhances imitation learning by leveraging the observed trajectories of nearby vehicles. We evaluate our approach using the state-of-the-art learning-based planning method PLUTO on the nuPlan dataset and demonstrate that our augmentation method leads to improved performance in complex driving scenarios.
arXiv Detail & Related papers (2025-03-08T00:40:47Z)
Multimodal LLM for Intelligent Transportation Systems [0.0]
This paper introduces a novel 3-dimensional framework that encapsulates the intersection of applications, machine learning methodologies, and hardware devices. Instead of using multiple machine learning algorithms, our framework uses a single, data-centric LLM architecture that can analyze time series, images, and videos. We apply this LLM framework to different sensor datasets, including time-series data and visual data from sources like Oxford Radar RobotCar, D-Behavior (D-Set), nuScenes by Motional, and Comma2k19.
arXiv Detail & Related papers (2024-12-16T11:50:30Z)
Edge-Cloud Collaborative Motion Planning for Autonomous Driving with Large Language Models [3.6503689363051364]
EC-Drive is a novel edge-cloud collaborative autonomous driving system with data drift detection capabilities. This study introduces EC-Drive, a novel edge-cloud collaborative autonomous driving system with data drift detection capabilities.
arXiv Detail & Related papers (2024-08-19T13:19:15Z)
CoVLA: Comprehensive Vision-Language-Action Dataset for Autonomous Driving [1.727597257312416]
CoVLA (Comprehensive Vision-Language-Action) dataset comprises real-world driving videos spanning more than 80 hours. This dataset establishes a framework for robust, interpretable, and data-driven autonomous driving systems.
arXiv Detail & Related papers (2024-08-19T09:53:49Z)
OmniDrive: A Holistic Vision-Language Dataset for Autonomous Driving with Counterfactual Reasoning [68.45848423501927]
We propose a holistic vision-language dataset that aligns agent models with 3D driving tasks through counterfactual reasoning. Our approach enhances decision-making by evaluating potential scenarios and their outcomes, similar to human drivers considering alternative actions.
arXiv Detail & Related papers (2024-05-02T17:59:24Z)
G-MEMP: Gaze-Enhanced Multimodal Ego-Motion Prediction in Driving [71.9040410238973]
We focus on inferring the ego trajectory of a driver's vehicle using their gaze data. Next, we develop G-MEMP, a novel multimodal ego-trajectory prediction network that combines GPS and video input with gaze data. The results show that G-MEMP significantly outperforms state-of-the-art methods in both benchmarks.
arXiv Detail & Related papers (2023-12-13T23:06:30Z)
Drive Anywhere: Generalizable End-to-end Autonomous Driving with Multi-modal Foundation Models [114.69732301904419]
We present an approach to apply end-to-end open-set (any environment/scene) autonomous driving that is capable of providing driving decisions from representations queryable by image and text. Our approach demonstrates unparalleled results in diverse tests while achieving significantly greater robustness in out-of-distribution situations.
arXiv Detail & Related papers (2023-10-26T17:56:35Z)
DriveGPT4: Interpretable End-to-end Autonomous Driving via Large Language Model [84.29836263441136]
This study introduces DriveGPT4, a novel interpretable end-to-end autonomous driving system based on multimodal large language models (MLLMs) DriveGPT4 facilitates the interpretation of vehicle actions, offers pertinent reasoning, and effectively addresses a diverse range of questions posed by users.
arXiv Detail & Related papers (2023-10-02T17:59:52Z)
Penalty-Based Imitation Learning With Cross Semantics Generation Sensor Fusion for Autonomous Driving [1.2749527861829049]
In this paper, we provide a penalty-based imitation learning approach to integrate multiple modalities of information. We observe a remarkable increase in the driving score by more than 12% when compared to the state-of-the-art (SOTA) model, InterFuser. Our model achieves this performance enhancement while achieving a 7-fold increase in inference speed and reducing the model size by approximately 30%.
arXiv Detail & Related papers (2023-03-21T14:29:52Z)
Visual Exemplar Driven Task-Prompting for Unified Perception in Autonomous Driving [100.3848723827869]
We present an effective multi-task framework, VE-Prompt, which introduces visual exemplars via task-specific prompting. Specifically, we generate visual exemplars based on bounding boxes and color-based markers, which provide accurate visual appearances of target categories. We bridge transformer-based encoders and convolutional layers for efficient and accurate unified perception in autonomous driving.
arXiv Detail & Related papers (2023-03-03T08:54:06Z)
Context-Aware Timewise VAEs for Real-Time Vehicle Trajectory Prediction [4.640835690336652]
We present ContextVAE, a context-aware approach for multi-modal vehicle trajectory prediction. Our approach takes into account both the social features exhibited by agents on the scene and the physical environment constraints. In all tested datasets, ContextVAE models are fast to train and provide high-quality multi-modal predictions in real-time.
arXiv Detail & Related papers (2023-02-21T18:42:24Z)
Generative AI-empowered Simulation for Autonomous Driving in Vehicular Mixed Reality Metaverses [130.15554653948897]
In vehicular mixed reality (MR) Metaverse, distance between physical and virtual entities can be overcome. Large-scale traffic and driving simulation via realistic data collection and fusion from the physical world is difficult and costly. We propose an autonomous driving architecture, where generative AI is leveraged to synthesize unlimited conditioned traffic and driving data in simulations.
arXiv Detail & Related papers (2023-02-16T16:54:10Z)
CARNet: A Dynamic Autoencoder for Learning Latent Dynamics in Autonomous Driving Tasks [11.489187712465325]
An autonomous driving system should effectively use the information collected from the various sensors in order to form an abstract description of the world. Deep learning models, such as autoencoders, can be used for that purpose, as they can learn compact latent representations from a stream of incoming data. This work proposes CARNet, a Combined dynAmic autoencodeR NETwork architecture that utilizes an autoencoder combined with a recurrent neural network to learn the current latent representation.
arXiv Detail & Related papers (2022-05-18T04:15:42Z)
DMD: A Large-Scale Multi-Modal Driver Monitoring Dataset for Attention and Alertness Analysis [54.198237164152786]
Vision is the richest and most cost-effective technology for Driver Monitoring Systems (DMS) The lack of sufficiently large and comprehensive datasets is currently a bottleneck for the progress of DMS development. In this paper, we introduce the Driver Monitoring dataset (DMD), an extensive dataset which includes real and simulated driving scenarios.
arXiv Detail & Related papers (2020-08-27T12:33:54Z)

This list is automatically generated from the titles and abstracts of the papers in this site.