Related papers: MR3D-Net: Dynamic Multi-Resolution 3D Sparse Voxel Grid Fusion for LiDAR-Based Collective Perception

MR3D-Net: Dynamic Multi-Resolution 3D Sparse Voxel Grid Fusion for LiDAR-Based Collective Perception

URL: http://arxiv.org/abs/2408.06137v1
Date: Mon, 12 Aug 2024 13:27:11 GMT
Title: MR3D-Net: Dynamic Multi-Resolution 3D Sparse Voxel Grid Fusion for LiDAR-Based Collective Perception
Authors: Sven Teufel, Jörg Gamerdinger, Georg Volk, Oliver Bringmann,
Abstract summary: We propose MR3D-Net, a dynamic multi-resolution 3D sparse voxel grid fusion backbone architecture for LiDAR-based collective perception. We show that sparse voxel grids at varying resolutions provide a meaningful and compact environment representation that can adapt to the communication bandwidth.
Score: 0.5714074111744111
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: The safe operation of automated vehicles depends on their ability to perceive the environment comprehensively. However, occlusion, sensor range, and environmental factors limit their perception capabilities. To overcome these limitations, collective perception enables vehicles to exchange information. However, fusing this exchanged information is a challenging task. Early fusion approaches require large amounts of bandwidth, while intermediate fusion approaches face interchangeability issues. Late fusion of shared detections is currently the only feasible approach. However, it often results in inferior performance due to information loss. To address this issue, we propose MR3D-Net, a dynamic multi-resolution 3D sparse voxel grid fusion backbone architecture for LiDAR-based collective perception. We show that sparse voxel grids at varying resolutions provide a meaningful and compact environment representation that can adapt to the communication bandwidth. MR3D-Net achieves state-of-the-art performance on the OPV2V 3D object detection benchmark while reducing the required bandwidth by up to 94% compared to early fusion. Code is available at https://github.com/ekut-es/MR3D-Net

Related papers

Multistream Network for LiDAR and Camera-based 3D Object Detection in Outdoor Scenes [59.78696921486972]
Fusion of LiDAR and RGB data has the potential to enhance outdoor 3D object detection accuracy.<n>We propose a MultiStream Detection (MuStD) network, that meticulously extracts task-relevant information from both data modalities.
arXiv Detail & Related papers (2025-07-25T14:20:16Z)
DualDiff: Dual-branch Diffusion Model for Autonomous Driving with Semantic Fusion [9.225796678303487]
We propose DualDiff, a dual-branch conditional diffusion model designed to enhance multi-view driving scene generation.<n>We introduce Occupancy Ray Sampling (ORS), a semantic-rich 3D representation, alongside numerical driving scene representation.<n>To improve cross-modal information integration, we propose a Semantic Fusion Attention (SFA) mechanism that aligns and fuses features across modalities.
arXiv Detail & Related papers (2025-05-03T16:20:01Z)
FLARES: Fast and Accurate LiDAR Multi-Range Semantic Segmentation [52.89847760590189]
3D scene understanding is a critical yet challenging task in autonomous driving. Recent methods leverage the range-view representation to improve processing efficiency. We re-design the workflow for range-view-based LiDAR semantic segmentation.
arXiv Detail & Related papers (2025-02-13T12:39:26Z)
HEAD: A Bandwidth-Efficient Cooperative Perception Approach for Heterogeneous Connected and Autonomous Vehicles [9.10239345027499]
HEAD is a method that fuses features from the classification and regression heads in 3D object detection networks. Our experiments demonstrate that HEAD is a fusion method that effectively balances communication bandwidth and perception performance.
arXiv Detail & Related papers (2024-08-27T22:05:44Z)
UniTR: A Unified and Efficient Multi-Modal Transformer for Bird's-Eye-View Representation [113.35352122662752]
We present an efficient multi-modal backbone for outdoor 3D perception named UniTR. UniTR processes a variety of modalities with unified modeling and shared parameters. UniTR is also a fundamentally task-agnostic backbone that naturally supports different 3D perception tasks.
arXiv Detail & Related papers (2023-08-15T12:13:44Z)
Multi-Modal 3D Object Detection by Box Matching [109.43430123791684]
We propose a novel Fusion network by Box Matching (FBMNet) for multi-modal 3D detection. With the learned assignments between 3D and 2D object proposals, the fusion for detection can be effectively performed by combing their ROI features.
arXiv Detail & Related papers (2023-05-12T18:08:51Z)
MSF3DDETR: Multi-Sensor Fusion 3D Detection Transformer for Autonomous Driving [0.0]
We propose MSF3DDETR: Multi-Sensor Fusion 3D Detection Transformer architecture to fuse image and LiDAR features to improve the detection accuracy. Our end-to-end single-stage, anchor-free and NMS-free network takes in multi-view images and LiDAR point clouds and predicts 3D bounding boxes. MSF3DDETR network is trained end-to-end on the nuScenes dataset using Hungarian algorithm based bipartite matching and set-to-set loss inspired by DETR.
arXiv Detail & Related papers (2022-10-27T10:55:15Z)
EPNet++: Cascade Bi-directional Fusion for Multi-Modal 3D Object Detection [56.03081616213012]
We propose EPNet++ for multi-modal 3D object detection by introducing a novel Cascade Bi-directional Fusion(CB-Fusion) module. The proposed CB-Fusion module boosts the plentiful semantic information of point features with the image features in a cascade bi-directional interaction fusion manner. The experiment results on the KITTI, JRDB and SUN-RGBD datasets demonstrate the superiority of EPNet++ over the state-of-the-art methods.
arXiv Detail & Related papers (2021-12-21T10:48:34Z)
Embracing Single Stride 3D Object Detector with Sparse Transformer [63.179720817019096]
In LiDAR-based 3D object detection for autonomous driving, the ratio of the object size to input scene size is significantly smaller compared to 2D detection cases. Many 3D detectors directly follow the common practice of 2D detectors, which downsample the feature maps even after quantizing the point clouds. We propose Single-stride Sparse Transformer (SST) to maintain the original resolution from the beginning to the end of the network.
arXiv Detail & Related papers (2021-12-13T02:12:02Z)
Progressive Coordinate Transforms for Monocular 3D Object Detection [52.00071336733109]
We propose a novel and lightweight approach, dubbed em Progressive Coordinate Transforms (PCT) to facilitate learning coordinate representations. In this paper, we propose a novel and lightweight approach, dubbed em Progressive Coordinate Transforms (PCT) to facilitate learning coordinate representations.
arXiv Detail & Related papers (2021-08-12T15:22:33Z)
RPVNet: A Deep and Efficient Range-Point-Voxel Fusion Network for LiDAR Point Cloud Segmentation [28.494690309193068]
We propose a novel range-point-voxel fusion network, namely RPVNet. In this network, we devise a deep fusion framework with multiple and mutual information interactions among these three views. By leveraging this efficient interaction and relatively lower voxel resolution, our method is also proved to be more efficient.
arXiv Detail & Related papers (2021-03-24T04:24:12Z)
Volumetric Propagation Network: Stereo-LiDAR Fusion for Long-Range Depth Estimation [81.08111209632501]
We propose a geometry-aware stereo-LiDAR fusion network for long-range depth estimation. We exploit sparse and accurate point clouds as a cue for guiding correspondences of stereo images in a unified 3D volume space. Our network achieves state-of-the-art performance on the KITTI and the Virtual- KITTI datasets.
arXiv Detail & Related papers (2021-03-24T03:24:46Z)

This list is automatically generated from the titles and abstracts of the papers in this site.