ME$^3$-BEV: Mamba-Enhanced Deep Reinforcement Learning for End-to-End Autonomous Driving with BEV-Perception
- URL: http://arxiv.org/abs/2508.06074v1
- Date: Fri, 08 Aug 2025 07:13:28 GMT
- Title: ME$^3$-BEV: Mamba-Enhanced Deep Reinforcement Learning for End-to-End Autonomous Driving with BEV-Perception
- Authors: Siyi Lu, Run Liu, Dongsheng Yang, Lei He,
- Abstract summary: This paper presents a novel approach to autonomous driving using deep learning reinforcement (DRL)<n>We introduce the textttMamba-BEV model, an efficient-temporal feature extraction network that combines BEV-based perception with the Mamba framework for temporal feature modeling.<n>Building on this, we propose the textttME$3$-BEV framework, achieving superior performance in dynamic urban driving scenarios.
- Score: 3.337516332070527
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Autonomous driving systems face significant challenges in perceiving complex environments and making real-time decisions. Traditional modular approaches, while offering interpretability, suffer from error propagation and coordination issues, whereas end-to-end learning systems can simplify the design but face computational bottlenecks. This paper presents a novel approach to autonomous driving using deep reinforcement learning (DRL) that integrates bird's-eye view (BEV) perception for enhanced real-time decision-making. We introduce the \texttt{Mamba-BEV} model, an efficient spatio-temporal feature extraction network that combines BEV-based perception with the Mamba framework for temporal feature modeling. This integration allows the system to encode vehicle surroundings and road features in a unified coordinate system and accurately model long-range dependencies. Building on this, we propose the \texttt{ME$^3$-BEV} framework, which utilizes the \texttt{Mamba-BEV} model as a feature input for end-to-end DRL, achieving superior performance in dynamic urban driving scenarios. We further enhance the interpretability of the model by visualizing high-dimensional features through semantic segmentation, providing insight into the learned representations. Extensive experiments on the CARLA simulator demonstrate that \texttt{ME$^3$-BEV} outperforms existing models across multiple metrics, including collision rate and trajectory accuracy, offering a promising solution for real-time autonomous driving.
Related papers
- DriveVLA-W0: World Models Amplify Data Scaling Law in Autonomous Driving [52.63591791507895]
We propose textbfDriveVLA-W0, a training paradigm that employs world modeling to predict future images.<n>This task generates a dense, self-supervised signal that compels the model to learn the underlying dynamics of the driving environment.<n>Experiments on the NAVSIM v1/v2 benchmark and a 680x larger in-house dataset demonstrate that DriveVLA-W0 significantly outperforms BEV and VLA baselines.
arXiv Detail & Related papers (2025-10-14T17:59:47Z) - Discrete Diffusion for Reflective Vision-Language-Action Models in Autonomous Driving [55.13109926181247]
We introduce ReflectDrive, a learning-based framework that integrates a reflection mechanism for safe trajectory generation via discrete diffusion.<n>Central to our approach is a safety-aware reflection mechanism that performs iterative self-correction without gradient.<n>Our method begins with goal-conditioned trajectory generation to model multi-modal driving behaviors.
arXiv Detail & Related papers (2025-09-24T13:35:15Z) - ImagiDrive: A Unified Imagination-and-Planning Framework for Autonomous Driving [64.12414815634847]
Vision-Language Models (VLMs) and Driving World Models (DWMs) have independently emerged as powerful recipes addressing different aspects of this challenge.<n>We propose ImagiDrive, a novel end-to-end autonomous driving framework that integrates a VLM-based driving agent with a DWM-based scene imaginer.
arXiv Detail & Related papers (2025-08-15T12:06:55Z) - Deep Bilinear Koopman Model for Real-Time Vehicle Control in Frenet Frame [0.0]
This paper presents a deep Koopman approach for modeling and control of vehicle dynamics within the curvilinear Frenet frame.<n>The proposed framework uses a deep neural network architecture to simultaneously learn the Koopman operator and its associated invariant subspace from the data.<n>The proposed controller achieved significant reductions in tracking error relative to baseline controllers, confirming its suitability for real-time implementation in embedded autonomous vehicle systems.
arXiv Detail & Related papers (2025-07-16T18:49:44Z) - SOLVE: Synergy of Language-Vision and End-to-End Networks for Autonomous Driving [51.47621083057114]
SOLVE is an innovative framework that synergizes Vision-Language Models with end-to-end (E2E) models to enhance autonomous vehicle planning.<n>Our approach emphasizes knowledge sharing at the feature level through a shared visual encoder, enabling comprehensive interaction between VLM and E2E components.
arXiv Detail & Related papers (2025-05-22T15:44:30Z) - VLM-E2E: Enhancing End-to-End Autonomous Driving with Multimodal Driver Attention Fusion [8.738991730715039]
We propose VLM-E2E, a novel framework that uses the Vision-Language Models to enhance training by providing attentional cues.<n>By focusing on attentional semantics, VLM-E2E better aligns with human-like driving behavior, which is critical for navigating dynamic and complex environments.<n>We evaluate VLM-E2E on the nuScenes dataset and achieve significant improvements in perception, prediction, and planning over the baseline end-to-end model.
arXiv Detail & Related papers (2025-02-25T10:02:12Z) - MambaBEV: An efficient 3D detection model with Mamba2 [4.667459324253689]
MambaBEV is a BEV based 3D object detection model that leverages Mamba2, an advanced state space model (SSM) optimized for long sequence processing.<n>MambaBEV base achieves an NDS of 51.7% and an mAP of 42.7%.<n>Our results highlight the potential of SSMs in autonomous driving perception, particularly in enhancing global context understanding and large object detection.
arXiv Detail & Related papers (2024-10-16T15:37:29Z) - DiFSD: Ego-Centric Fully Sparse Paradigm with Uncertainty Denoising and Iterative Refinement for Efficient End-to-End Self-Driving [55.53171248839489]
We propose an ego-centric fully sparse paradigm, named DiFSD, for end-to-end self-driving.<n>Specifically, DiFSD mainly consists of sparse perception, hierarchical interaction and iterative motion planner.<n>Experiments conducted on nuScenes and Bench2Drive datasets demonstrate the superior planning performance and great efficiency of DiFSD.
arXiv Detail & Related papers (2024-09-15T15:55:24Z) - Hierarchical and Decoupled BEV Perception Learning Framework for Autonomous Driving [52.808273563372126]
This paper proposes a novel hierarchical BEV perception paradigm, aiming to provide a library of fundamental perception modules and user-friendly graphical interface.
We conduct the Pretrain-Finetune strategy to effectively utilize large scale public datasets and streamline development processes.
We also present a Multi-Module Learning (MML) approach, enhancing performance through synergistic and iterative training of multiple models.
arXiv Detail & Related papers (2024-07-17T11:17:20Z) - BEVWorld: A Multimodal World Simulator for Autonomous Driving via Scene-Level BEV Latents [56.33989853438012]
We propose BEVWorld, a framework that transforms multimodal sensor inputs into a unified and compact Bird's Eye View latent space for holistic environment modeling.<n>The proposed world model consists of two main components: a multi-modal tokenizer and a latent BEV sequence diffusion model.
arXiv Detail & Related papers (2024-07-08T07:26:08Z) - Adaptive and Parallel Split Federated Learning in Vehicular Edge Computing [6.004901615052089]
Vehicular edge intelligence (VEI) is a promising paradigm for enabling future intelligent transportation systems.
Federated learning (FL) is one of the fundamental technologies facilitating collaborative model training locally and aggregation.
We develop an Adaptive Split Federated Learning scheme for Vehicular Edge Computing (ASFV)
arXiv Detail & Related papers (2024-05-29T02:34:38Z) - Drive Anywhere: Generalizable End-to-end Autonomous Driving with
Multi-modal Foundation Models [114.69732301904419]
We present an approach to apply end-to-end open-set (any environment/scene) autonomous driving that is capable of providing driving decisions from representations queryable by image and text.
Our approach demonstrates unparalleled results in diverse tests while achieving significantly greater robustness in out-of-distribution situations.
arXiv Detail & Related papers (2023-10-26T17:56:35Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.