MapFusion: A Novel BEV Feature Fusion Network for Multi-modal Map Construction
- URL: http://arxiv.org/abs/2502.04377v1
- Date: Wed, 05 Feb 2025 16:25:45 GMT
- Title: MapFusion: A Novel BEV Feature Fusion Network for Multi-modal Map Construction
- Authors: Xiaoshuai Hao, Yunfeng Diao, Mengchuan Wei, Yifan Yang, Peng Hao, Rong Yin, Hui Zhang, Weiming Li, Shu Zhao, Yu Liu,
- Abstract summary: We propose MapFusion, a novel multi-modal Bird's-Eye View (BEV) feature fusion method for map construction.<n>We introduce the Cross-modal Interaction Transform (CIT) module, enabling interaction between two BEV feature spaces.<n>We also propose an effective Dual Dynamic Fusion (DDF) module to adaptively select valuable information from different modalities.
- Score: 23.212961039696722
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Map construction task plays a vital role in providing precise and comprehensive static environmental information essential for autonomous driving systems. Primary sensors include cameras and LiDAR, with configurations varying between camera-only, LiDAR-only, or camera-LiDAR fusion, based on cost-performance considerations. While fusion-based methods typically perform best, existing approaches often neglect modality interaction and rely on simple fusion strategies, which suffer from the problems of misalignment and information loss. To address these issues, we propose MapFusion, a novel multi-modal Bird's-Eye View (BEV) feature fusion method for map construction. Specifically, to solve the semantic misalignment problem between camera and LiDAR BEV features, we introduce the Cross-modal Interaction Transform (CIT) module, enabling interaction between two BEV feature spaces and enhancing feature representation through a self-attention mechanism. Additionally, we propose an effective Dual Dynamic Fusion (DDF) module to adaptively select valuable information from different modalities, which can take full advantage of the inherent information between different modalities. Moreover, MapFusion is designed to be simple and plug-and-play, easily integrated into existing pipelines. We evaluate MapFusion on two map construction tasks, including High-definition (HD) map and BEV map segmentation, to show its versatility and effectiveness. Compared with the state-of-the-art methods, MapFusion achieves 3.6% and 6.2% absolute improvements on the HD map construction and BEV map segmentation tasks on the nuScenes dataset, respectively, demonstrating the superiority of our approach.
Related papers
- MapExpert: Online HD Map Construction with Simple and Efficient Sparse Map Element Expert [7.086030137483952]
We introduce an expert-based online HD map method, termed MapExpert.<n>MapExpert utilizes sparse experts, distributed by our routers, to describe various non-cubic map elements accurately.
arXiv Detail & Related papers (2024-12-17T09:19:44Z) - Neural Semantic Map-Learning for Autonomous Vehicles [85.8425492858912]
We present a mapping system that fuses local submaps gathered from a fleet of vehicles at a central instance to produce a coherent map of the road environment.
Our method jointly aligns and merges the noisy and incomplete local submaps using a scene-specific Neural Signed Distance Field.
We leverage memory-efficient sparse feature-grids to scale to large areas and introduce a confidence score to model uncertainty in scene reconstruction.
arXiv Detail & Related papers (2024-10-10T10:10:03Z) - MemFusionMap: Working Memory Fusion for Online Vectorized HD Map Construction [6.743612231580936]
We propose a novel temporal fusion model with enhanced temporal reasoning capabilities for online HD map construction.
Specifically, we contribute a working memory fusion module that improves the model's memory capacity to reason across a history of frames.
We also design a novel temporal overlap heatmap to explicitly inform the model about the temporal overlap information and vehicle trajectory.
arXiv Detail & Related papers (2024-09-26T03:16:39Z) - PVAFN: Point-Voxel Attention Fusion Network with Multi-Pooling Enhancing for 3D Object Detection [59.355022416218624]
integration of point and voxel representations is becoming more common in LiDAR-based 3D object detection.
We propose a novel two-stage 3D object detector, called Point-Voxel Attention Fusion Network (PVAFN)
PVAFN uses a multi-pooling strategy to integrate both multi-scale and region-specific information effectively.
arXiv Detail & Related papers (2024-08-26T19:43:01Z) - Fusion-Mamba for Cross-modality Object Detection [63.56296480951342]
Cross-modality fusing information from different modalities effectively improves object detection performance.
We design a Fusion-Mamba block (FMB) to map cross-modal features into a hidden state space for interaction.
Our proposed approach outperforms the state-of-the-art methods on $m$AP with 5.9% on $M3FD$ and 4.9% on FLIR-Aligned datasets.
arXiv Detail & Related papers (2024-04-14T05:28:46Z) - IS-Fusion: Instance-Scene Collaborative Fusion for Multimodal 3D Object Detection [130.394884412296]
We propose IS-Fusion, an innovative multimodal fusion framework.
It captures the Instance- and Scene-level contextual information.
Is-Fusion essentially differs from existing approaches that only focus on the BEV scene-level fusion.
arXiv Detail & Related papers (2024-03-22T14:34:17Z) - Complementing Onboard Sensors with Satellite Map: A New Perspective for
HD Map Construction [31.0701760075554]
High-definition (HD) maps play a crucial role in autonomous driving systems.
Recent methods have attempted to construct HD maps in real-time using vehicle onboard sensors.
We explore a new perspective that boosts HD map construction through the use of satellite maps to complement onboard sensors.
arXiv Detail & Related papers (2023-08-29T16:33:16Z) - NeMO: Neural Map Growing System for Spatiotemporal Fusion in
Bird's-Eye-View and BDD-Map Benchmark [9.430779563669908]
Vision-centric Bird's-Eye View representation is essential for autonomous driving systems.
This work outlines a new paradigm, named NeMO, for generating local maps through the utilization of a readable and writable big map.
With an assumption that the feature distribution of all BEV grids follows an identical pattern, we adopt a shared-weight neural network for all grids to update the big map.
arXiv Detail & Related papers (2023-06-07T15:46:15Z) - BEVFusion: Multi-Task Multi-Sensor Fusion with Unified Bird's-Eye View Representation [105.96557764248846]
We introduce BEVFusion, a generic multi-task multi-sensor fusion framework.
It unifies multi-modal features in the shared bird's-eye view representation space.
It achieves 1.3% higher mAP and NDS on 3D object detection and 13.6% higher mIoU on BEV map segmentation, with 1.9x lower cost.
arXiv Detail & Related papers (2022-05-26T17:59:35Z) - Full-Duplex Strategy for Video Object Segmentation [141.43983376262815]
Full- Strategy Network (FSNet) is a novel framework for video object segmentation (VOS)
Our FSNet performs the crossmodal feature-passing (i.e., transmission and receiving) simultaneously before fusion decoding stage.
We show that our FSNet outperforms other state-of-the-arts for both the VOS and video salient object detection tasks.
arXiv Detail & Related papers (2021-08-06T14:50:50Z) - MapFusion: A General Framework for 3D Object Detection with HDMaps [17.482961825285013]
We propose MapFusion to integrate the map information into modern 3D object detector pipelines.
By fusing the map information, we can achieve 1.27 to 2.79 points improvements for mean Average Precision (mAP) on three strong 3d object detection baselines.
arXiv Detail & Related papers (2021-03-10T08:36:59Z) - Distributed Dynamic Map Fusion via Federated Learning for Intelligent
Networked Vehicles [9.748996198083425]
This paper proposes a federated learning based dynamic map fusion framework to achieve high map quality.
The proposed framework is implemented in the Car Learning to Act (CARLA) simulation platform.
arXiv Detail & Related papers (2021-03-05T16:28:46Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.