Research on Driving Scenario Technology Based on Multimodal Large Lauguage Model Optimization
- URL: http://arxiv.org/abs/2506.02014v1
- Date: Wed, 28 May 2025 02:22:11 GMT
- Title: Research on Driving Scenario Technology Based on Multimodal Large Lauguage Model Optimization
- Authors: Wang Mengjie, Zhu Huiping, Li Jian, Shi Wenxiu, Zhang Song,
- Abstract summary: This paper proposes a comprehensive method for optimizing multimodal models in driving scenarios.<n>The method covers key aspects such as dynamic prompt optimization, dataset construction, model training, and deployment.<n>In model training, advanced techniques like knowledge distillation, dynamic fine-tuning, and quantization are integrated to reduce storage and computational costs while boosting performance.
- Score: 1.0485739694839669
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: With the advancement of autonomous and assisted driving technologies, higher demands are placed on the ability to understand complex driving scenarios. Multimodal general large models have emerged as a solution for this challenge. However, applying these models in vertical domains involves difficulties such as data collection, model training, and deployment optimization. This paper proposes a comprehensive method for optimizing multimodal models in driving scenarios, including cone detection, traffic light recognition, speed limit recommendation, and intersection alerts. The method covers key aspects such as dynamic prompt optimization, dataset construction, model training, and deployment. Specifically, the dynamic prompt optimization adjusts the prompts based on the input image content to focus on objects affecting the ego vehicle, enhancing the model's task-specific focus and judgment capabilities. The dataset is constructed by combining real and synthetic data to create a high-quality and diverse multimodal training dataset, improving the model's generalization in complex driving environments. In model training, advanced techniques like knowledge distillation, dynamic fine-tuning, and quantization are integrated to reduce storage and computational costs while boosting performance. Experimental results show that this systematic optimization method not only significantly improves the model's accuracy in key tasks but also achieves efficient resource utilization, providing strong support for the practical application of driving scenario perception technologies.
Related papers
- DriveGen: Towards Infinite Diverse Traffic Scenarios with Large Models [22.21497010925769]
DriveGen is a novel traffic simulation framework with large models for more diverse traffic generation.<n>DriveGen fully utilizes large models' high-level cognition and reasoning of driving behavior.<n>Our generated scenarios and corner cases have a superior performance compared to state-of-the-art baselines.
arXiv Detail & Related papers (2025-03-04T06:14:21Z) - A Survey of Automatic Prompt Engineering: An Optimization Perspective [18.933465526053453]
This paper presents the first comprehensive survey on automated prompt engineering through a unified optimization-theoretic lens.<n>We formalize prompt optimization as a problem over discrete, continuous, and hybrid prompt spaces.<n>We highlight underexplored frontiers in constrained optimization and agent-oriented prompt design.
arXiv Detail & Related papers (2025-02-17T08:48:07Z) - Direct Preference Optimization-Enhanced Multi-Guided Diffusion Model for Traffic Scenario Generation [0.0]
Diffusion-based models are recognized for their effectiveness in using real-world driving data to generate realistic traffic scenarios.<n>These models employ guided sampling to incorporate specific traffic preferences and enhance scenario realism.<n>We introduce a multi-guided diffusion model that utilizes a novel training strategy to closely adhere to traffic priors.
arXiv Detail & Related papers (2025-02-14T05:29:43Z) - A Survey of World Models for Autonomous Driving [63.33363128964687]
Recent breakthroughs in autonomous driving have been propelled by advances in robust world modeling.<n>This paper systematically reviews recent advances in world models for autonomous driving.
arXiv Detail & Related papers (2025-01-20T04:00:02Z) - A Survey on Inference Optimization Techniques for Mixture of Experts Models [50.40325411764262]
Large-scale Mixture of Experts (MoE) models offer enhanced model capacity and computational efficiency through conditional computation.<n> deploying and running inference on these models presents significant challenges in computational resources, latency, and energy efficiency.<n>This survey analyzes optimization techniques for MoE models across the entire system stack.
arXiv Detail & Related papers (2024-12-18T14:11:15Z) - From Imitation to Exploration: End-to-end Autonomous Driving based on World Model [24.578178308010912]
RAMBLE is an end-to-end world model-based RL method for driving decision-making.<n>It can handle complex and dynamic traffic scenarios.<n>It achieves state-of-the-art performance in route completion rate on the CARLA Leaderboard 1.0 and completes all 38 scenarios on the CARLA Leaderboard 2.0.
arXiv Detail & Related papers (2024-10-03T06:45:59Z) - Structuring a Training Strategy to Robustify Perception Models with Realistic Image Augmentations [1.5723316845301678]
This report introduces a novel methodology for training with augmentations to enhance model robustness and performance in such conditions.
We present a comprehensive framework that includes identifying weak spots in Machine Learning models, selecting suitable augmentations, and devising effective training strategies.
Experimental results demonstrate improvements in model performance, as measured by commonly used metrics such as mean Average Precision (mAP) and mean Intersection over Union (mIoU) on open-source object detection and semantic segmentation models and datasets.
arXiv Detail & Related papers (2024-08-30T14:15:48Z) - Inference Optimization of Foundation Models on AI Accelerators [68.24450520773688]
Powerful foundation models, including large language models (LLMs), with Transformer architectures have ushered in a new era of Generative AI.
As the number of model parameters reaches to hundreds of billions, their deployment incurs prohibitive inference costs and high latency in real-world scenarios.
This tutorial offers a comprehensive discussion on complementary inference optimization techniques using AI accelerators.
arXiv Detail & Related papers (2024-07-12T09:24:34Z) - When Parameter-efficient Tuning Meets General-purpose Vision-language
Models [65.19127815275307]
PETAL revolutionizes the training process by requiring only 0.5% of the total parameters, achieved through a unique mode approximation technique.
Our experiments reveal that PETAL not only outperforms current state-of-the-art methods in most scenarios but also surpasses full fine-tuning models in effectiveness.
arXiv Detail & Related papers (2023-12-16T17:13:08Z) - Pre-training Contextualized World Models with In-the-wild Videos for
Reinforcement Learning [54.67880602409801]
In this paper, we study the problem of pre-training world models with abundant in-the-wild videos for efficient learning of visual control tasks.
We introduce Contextualized World Models (ContextWM) that explicitly separate context and dynamics modeling.
Our experiments show that in-the-wild video pre-training equipped with ContextWM can significantly improve the sample efficiency of model-based reinforcement learning.
arXiv Detail & Related papers (2023-05-29T14:29:12Z) - Gradient-Based Trajectory Optimization With Learned Dynamics [80.41791191022139]
We use machine learning techniques to learn a differentiable dynamics model of the system from data.
We show that a neural network can model highly nonlinear behaviors accurately for large time horizons.
In our hardware experiments, we demonstrate that our learned model can represent complex dynamics for both the Spot and Radio-controlled (RC) car.
arXiv Detail & Related papers (2022-04-09T22:07:34Z) - Multi-intersection Traffic Optimisation: A Benchmark Dataset and a
Strong Baseline [85.9210953301628]
Control of traffic signals is fundamental and critical to alleviate traffic congestion in urban areas.
Because of the high complexity of modelling the problem, experimental settings of current works are often inconsistent.
We propose a novel and strong baseline model based on deep reinforcement learning with the encoder-decoder structure.
arXiv Detail & Related papers (2021-01-24T03:55:39Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.