Related papers: A Unified Perception-Language-Action Framework for Adaptive Autonomous Driving

A Unified Perception-Language-Action Framework for Adaptive Autonomous Driving

URL: http://arxiv.org/abs/2507.23540v1
Date: Thu, 31 Jul 2025 13:30:47 GMT
Title: A Unified Perception-Language-Action Framework for Adaptive Autonomous Driving
Authors: Yi Zhang, Erik Leo Haß, Kuo-Yi Chao, Nenad Petrovic, Yinglei Song, Chengdong Wu, Alois Knoll,
Abstract summary: We propose a unified Perception-Language-Action (PLA) framework that integrates multi-sensor fusion (cameras, LiDAR, radar) with a large language model (LLM)-augmented Vision-Language-Action (VLA) architecture.<n>This framework unifies low-level sensory processing with high-level contextual reasoning to enable context-aware, explainable, and safety-bounded autonomous driving.
Score: 10.685706490545956
License: http://creativecommons.org/licenses/by/4.0/
Abstract: Autonomous driving systems face significant challenges in achieving human-like adaptability, robustness, and interpretability in complex, open-world environments. These challenges stem from fragmented architectures, limited generalization to novel scenarios, and insufficient semantic extraction from perception. To address these limitations, we propose a unified Perception-Language-Action (PLA) framework that integrates multi-sensor fusion (cameras, LiDAR, radar) with a large language model (LLM)-augmented Vision-Language-Action (VLA) architecture, specifically a GPT-4.1-powered reasoning core. This framework unifies low-level sensory processing with high-level contextual reasoning, tightly coupling perception with natural language-based semantic understanding and decision-making to enable context-aware, explainable, and safety-bounded autonomous driving. Evaluations on an urban intersection scenario with a construction zone demonstrate superior performance in trajectory tracking, speed prediction, and adaptive planning. The results highlight the potential of language-augmented cognitive frameworks for advancing the safety, interpretability, and scalability of autonomous driving systems.

Related papers

ReAL-AD: Towards Human-Like Reasoning in End-to-End Autonomous Driving [27.75047397292818]
End-to-end autonomous driving has emerged as a promising approach to unify perception, prediction, and planning within a single framework.<n>We propose ReAL-AD, a Reasoning-Augmented Learning framework that structures decision-making in autonomous driving based on the three-tier human cognitive model.<n>We show that integrating our framework improves planning accuracy and safety by over 30%, making end-to-end autonomous driving more interpretable and aligned with human-like hierarchical reasoning.
arXiv Detail & Related papers (2025-07-16T02:23:24Z)
Case-based Reasoning Augmented Large Language Model Framework for Decision Making in Realistic Safety-Critical Driving Scenarios [1.5367554212163714]
This paper presents a Case-Based Reasoning Augmented Large Language Model (CBR-LLM) framework for evasive maneuver decision-making in complex risk scenarios.<n>Our approach integrates semantic scene understanding from dashcam video inputs with the retrieval of relevant past driving cases.<n>Experiments show that our framework improves decision accuracy, justification quality, and alignment with human expert behavior.
arXiv Detail & Related papers (2025-06-25T15:19:25Z)
SOLVE: Synergy of Language-Vision and End-to-End Networks for Autonomous Driving [51.47621083057114]
SOLVE is an innovative framework that synergizes Vision-Language Models with end-to-end (E2E) models to enhance autonomous vehicle planning.<n>Our approach emphasizes knowledge sharing at the feature level through a shared visual encoder, enabling comprehensive interaction between VLM and E2E components.
arXiv Detail & Related papers (2025-05-22T15:44:30Z)
A Framework for a Capability-driven Evaluation of Scenario Understanding for Multimodal Large Language Models in Autonomous Driving [15.24721920935653]
Multimodal large language models (MLLMs) hold the potential to enhance autonomous driving.<n>Their integration into autonomous driving systems shows promising results in isolated proof-of-concept applications.<n>This paper proposes a holistic framework for a capability-driven evaluation of MLLMs in autonomous driving.
arXiv Detail & Related papers (2025-03-14T13:43:26Z)
SafeAuto: Knowledge-Enhanced Safe Autonomous Driving with Multimodal Foundation Models [63.71984266104757]
We propose SafeAuto, a framework that enhances MLLM-based autonomous driving by incorporating both unstructured and structured knowledge.<n>To explicitly integrate safety knowledge, we develop a reasoning component that translates traffic rules into first-order logic.<n>Our Multimodal Retrieval-Augmented Generation model leverages video, control signals, and environmental attributes to learn from past driving experiences.
arXiv Detail & Related papers (2025-02-28T21:53:47Z)
INSIGHT: Enhancing Autonomous Driving Safety through Vision-Language Models on Context-Aware Hazard Detection and Edge Case Evaluation [7.362380225654904]
INSIGHT is a hierarchical vision-language model (VLM) framework designed to enhance hazard detection and edge-case evaluation.<n>By using multimodal data fusion, our approach integrates semantic and visual representations, enabling precise interpretation of driving scenarios.<n> Experimental results on the BDD100K dataset demonstrate a substantial improvement in hazard prediction straightforwardness and accuracy over existing models.
arXiv Detail & Related papers (2025-02-01T01:43:53Z)
Generative Planning with 3D-vision Language Pre-training for End-to-End Autonomous Driving [20.33096710167997]
generative planning with 3D-vision language pre-training model named GPVL is proposed for end-to-end autonomous driving.<n>Cross-modal language model is introduced to generate holistic driving decisions and fine-grained trajectories.<n>It is believed that the effective, robust and efficient performance of GPVL is crucial for the practical application of future autonomous driving systems.
arXiv Detail & Related papers (2025-01-15T15:20:46Z)
VLP: Vision Language Planning for Autonomous Driving [52.640371249017335]
This paper presents a novel Vision-Language-Planning framework that exploits language models to bridge the gap between linguistic understanding and autonomous driving. It achieves state-of-the-art end-to-end planning performance on the NuScenes dataset by achieving 35.9% and 60.5% reduction in terms of average L2 error and collision rates, respectively.
arXiv Detail & Related papers (2024-01-10T23:00:40Z)
Empowering Autonomous Driving with Large Language Models: A Safety Perspective [82.90376711290808]
This paper explores the integration of Large Language Models (LLMs) into Autonomous Driving systems. LLMs are intelligent decision-makers in behavioral planning, augmented with a safety verifier shield for contextual safety learning. We present two key studies in a simulated environment: an adaptive LLM-conditioned Model Predictive Control (MPC) and an LLM-enabled interactive behavior planning scheme with a state machine.
arXiv Detail & Related papers (2023-11-28T03:13:09Z)
Enabling High-Level Machine Reasoning with Cognitive Neuro-Symbolic Systems [67.01132165581667]
We propose to enable high-level reasoning in AI systems by integrating cognitive architectures with external neuro-symbolic components. We illustrate a hybrid framework centered on ACT-R and we discuss the role of generative models in recent and future applications.
arXiv Detail & Related papers (2023-11-13T21:20:17Z)
GPT-Driver: Learning to Drive with GPT [47.14350537515685]
We present a simple yet effective approach that can transform the OpenAI GPT-3.5 model into a reliable motion planner for autonomous vehicles. We capitalize on the strong reasoning capabilities and generalization potential inherent to Large Language Models (LLMs) We evaluate our approach on the large-scale nuScenes dataset, and extensive experiments substantiate the effectiveness, generalization ability, and interpretability of our GPT-based motion planner.
arXiv Detail & Related papers (2023-10-02T17:59:57Z)

This list is automatically generated from the titles and abstracts of the papers in this site.