Large Multimodal Models for Embodied Intelligent Driving: The Next Frontier in Self-Driving?
- URL: http://arxiv.org/abs/2601.08434v3
- Date: Sat, 17 Jan 2026 08:14:17 GMT
- Title: Large Multimodal Models for Embodied Intelligent Driving: The Next Frontier in Self-Driving?
- Authors: Long Zhang, Yuchen Xia, Bingqing Wei, Zhen Liu, Shiwen Mao, Zhu Han, Mohsen Guizani,
- Abstract summary: This article introduces a novel semantics and policy dual-driven hybrid decision framework to tackle this challenge.<n>The framework merges LMMs for semantic understanding and cognitive representation, and deep reinforcement learning (DRL) for real-time policy optimization.<n>Case study is conducted experimentally to validate the performance superiority of our framework in completing lane-change planning task.
- Score: 68.82027978227008
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: The advent of Large Multimodal Models (LMMs) offers a promising technology to tackle the limitations of modular design in autonomous driving, which often falters in open-world scenarios requiring sustained environmental understanding and logical reasoning. Besides, embodied artificial intelligence facilitates policy optimization through closed-loop interactions to achieve the continuous learning capability, thereby advancing autonomous driving toward embodied intelligent (El) driving. However, such capability will be constrained by relying solely on LMMs to enhance EI driving without joint decision-making. This article introduces a novel semantics and policy dual-driven hybrid decision framework to tackle this challenge, ensuring continuous learning and joint decision. The framework merges LMMs for semantic understanding and cognitive representation, and deep reinforcement learning (DRL) for real-time policy optimization. We start by introducing the foundational principles of EI driving and LMMs. Moreover, we examine the emerging opportunities this framework enables, encompassing potential benefits and representative use cases. A case study is conducted experimentally to validate the performance superiority of our framework in completing lane-change planning task. Finally, several future research directions to empower EI driving are identified to guide subsequent work.
Related papers
- ADRD: LLM-Driven Autonomous Driving Based on Rule-based Decision Systems [6.950569256149267]
We introduce the ADRD framework, which integrates three core modules: the Information Module, the Agents Module, and the Testing Module.<n>The framework exhibits superior performance in autonomous driving decision tasks.<n>This is the first work that integrates large language models with rule-based systems for autonomous driving decision-making.
arXiv Detail & Related papers (2025-06-17T08:18:20Z) - ReCogDrive: A Reinforced Cognitive Framework for End-to-End Autonomous Driving [49.07731497951963]
ReCogDrive is a novel Reinforced Cognitive framework for end-to-end autonomous driving.<n>We introduce a hierarchical data pipeline that mimics the sequential cognitive process of human drivers.<n>We then address the language-action mismatch by injecting the VLM's learned driving priors into a diffusion planner.
arXiv Detail & Related papers (2025-06-09T03:14:04Z) - TeLL-Drive: Enhancing Autonomous Driving with Teacher LLM-Guided Deep Reinforcement Learning [61.33599727106222]
TeLL-Drive is a hybrid framework that integrates a Teacher LLM to guide an attention-based Student DRL policy.<n>A self-attention mechanism then fuses these strategies with the DRL agent's exploration, accelerating policy convergence and boosting robustness.
arXiv Detail & Related papers (2025-02-03T14:22:03Z) - Parameterized Decision-making with Multi-modal Perception for Autonomous
Driving [12.21578713219778]
We propose a parameterized decision-making framework with multi-modal perception based on deep reinforcement learning, called AUTO.
A hybrid reward function takes into account aspects of safety, traffic efficiency, passenger comfort, and impact to guide the framework to generate optimal actions.
arXiv Detail & Related papers (2023-12-19T08:27:02Z) - Empowering Autonomous Driving with Large Language Models: A Safety Perspective [82.90376711290808]
This paper explores the integration of Large Language Models (LLMs) into Autonomous Driving systems.
LLMs are intelligent decision-makers in behavioral planning, augmented with a safety verifier shield for contextual safety learning.
We present two key studies in a simulated environment: an adaptive LLM-conditioned Model Predictive Control (MPC) and an LLM-enabled interactive behavior planning scheme with a state machine.
arXiv Detail & Related papers (2023-11-28T03:13:09Z) - LLM4Drive: A Survey of Large Language Models for Autonomous Driving [62.10344445241105]
Large language models (LLMs) have demonstrated abilities including understanding context, logical reasoning, and generating answers.
In this paper, we systematically review a research line about textitLarge Language Models for Autonomous Driving (LLM4AD).
arXiv Detail & Related papers (2023-11-02T07:23:33Z) - LanguageMPC: Large Language Models as Decision Makers for Autonomous Driving [84.31119464141631]
This work employs Large Language Models (LLMs) as a decision-making component for complex autonomous driving scenarios.<n>Extensive experiments demonstrate that our proposed method not only consistently surpasses baseline approaches in single-vehicle tasks, but also helps handle complex driving behaviors even multi-vehicle coordination.
arXiv Detail & Related papers (2023-10-04T17:59:49Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.