Related papers: HRFuser: A Multi-resolution Sensor Fusion Architecture for 2D Object Detection

HRFuser: A Multi-resolution Sensor Fusion Architecture for 2D Object Detection

URL: http://arxiv.org/abs/2206.15157v3
Date: Fri, 11 Aug 2023 11:06:09 GMT
Title: HRFuser: A Multi-resolution Sensor Fusion Architecture for 2D Object Detection
Authors: Tim Broedermann (1), Christos Sakaridis (1), Dengxin Dai (2) and Luc Van Gool (1 and 3) ((1) ETH Zurich, (2) MPI for Informatics, (3) KU Leuven)
Abstract summary: We propose HRFuser, a modular architecture for multi-modal 2D object detection. It fuses multiple sensors in a multi-resolution fashion and scales to an arbitrary number of input modalities. We demonstrate via experiments on nuScenes and the adverse conditions DENSE datasets that our model effectively leverages complementary features from additional modalities.
Score: 0.0
License: http://creativecommons.org/licenses/by/4.0/
Abstract: Besides standard cameras, autonomous vehicles typically include multiple additional sensors, such as lidars and radars, which help acquire richer information for perceiving the content of the driving scene. While several recent works focus on fusing certain pairs of sensors - such as camera with lidar or radar - by using architectural components specific to the examined setting, a generic and modular sensor fusion architecture is missing from the literature. In this work, we propose HRFuser, a modular architecture for multi-modal 2D object detection. It fuses multiple sensors in a multi-resolution fashion and scales to an arbitrary number of input modalities. The design of HRFuser is based on state-of-the-art high-resolution networks for image-only dense prediction and incorporates a novel multi-window cross-attention block as the means to perform fusion of multiple modalities at multiple resolutions. We demonstrate via extensive experiments on nuScenes and the adverse conditions DENSE datasets that our model effectively leverages complementary features from additional modalities, substantially improving upon camera-only performance and consistently outperforming state-of-the-art 3D and 2D fusion methods evaluated on 2D object detection metrics. The source code is publicly available.

Related papers

Progressive Multi-Modal Fusion for Robust 3D Object Detection [12.048303829428452]
Existing methods perform sensor fusion in a single view by projecting features from both modalities either in Bird's Eye View (BEV) or Perspective View (PV) We propose ProFusion3D, a progressive fusion framework that combines features in both BEV and PV at both intermediate and object query levels. Our architecture hierarchically fuses local and global features, enhancing the robustness of 3D object detection.
arXiv Detail & Related papers (2024-10-09T22:57:47Z)
MV2DFusion: Leveraging Modality-Specific Object Semantics for Multi-Modal 3D Detection [28.319440934322728]
MV2DFusion is a multi-modal detection framework that integrates the strengths of both worlds through an advanced query-based fusion mechanism. Our framework's flexibility allows it to integrate with any image and point cloud-based detectors, showcasing its adaptability and potential for future advancements.
arXiv Detail & Related papers (2024-08-12T06:46:05Z)
Multi-Modal 3D Object Detection by Box Matching [109.43430123791684]
We propose a novel Fusion network by Box Matching (FBMNet) for multi-modal 3D detection. With the learned assignments between 3D and 2D object proposals, the fusion for detection can be effectively performed by combing their ROI features.
arXiv Detail & Related papers (2023-05-12T18:08:51Z)
3D Dual-Fusion: Dual-Domain Dual-Query Camera-LiDAR Fusion for 3D Object Detection [13.068266058374775]
We propose a novel camera-LiDAR fusion architecture called 3D Dual-Fusion. The proposed method fuses the features of the camera-view and 3D voxel-view domain and models their interactions through deformable attention. The results of an experimental evaluation show that the proposed camera-LiDAR fusion architecture achieved competitive performance on the KITTI and nuScenes datasets.
arXiv Detail & Related papers (2022-11-24T11:00:50Z)
Bridging the View Disparity of Radar and Camera Features for Multi-modal Fusion 3D Object Detection [6.959556180268547]
This paper focuses on how to utilize millimeter-wave (MMW) radar and camera sensor fusion for 3D object detection. A novel method which realizes the feature-level fusion under bird-eye view (BEV) for a better feature representation is proposed.
arXiv Detail & Related papers (2022-08-25T13:21:37Z)
Target-aware Dual Adversarial Learning and a Multi-scenario Multi-Modality Benchmark to Fuse Infrared and Visible for Object Detection [65.30079184700755]
This study addresses the issue of fusing infrared and visible images that appear differently for object detection. Previous approaches discover commons underlying the two modalities and fuse upon the common space either by iterative optimization or deep networks. This paper proposes a bilevel optimization formulation for the joint problem of fusion and detection, and then unrolls to a target-aware Dual Adversarial Learning (TarDAL) network for fusion and a commonly used detection network.
arXiv Detail & Related papers (2022-03-30T11:44:56Z)
EPNet++: Cascade Bi-directional Fusion for Multi-Modal 3D Object Detection [56.03081616213012]
We propose EPNet++ for multi-modal 3D object detection by introducing a novel Cascade Bi-directional Fusion(CB-Fusion) module. The proposed CB-Fusion module boosts the plentiful semantic information of point features with the image features in a cascade bi-directional interaction fusion manner. The experiment results on the KITTI, JRDB and SUN-RGBD datasets demonstrate the superiority of EPNet++ over the state-of-the-art methods.
arXiv Detail & Related papers (2021-12-21T10:48:34Z)
EPMF: Efficient Perception-aware Multi-sensor Fusion for 3D Semantic Segmentation [62.210091681352914]
We study multi-sensor fusion for 3D semantic segmentation for many applications, such as autonomous driving and robotics. In this work, we investigate a collaborative fusion scheme called perception-aware multi-sensor fusion (PMF) We propose a two-stream network to extract features from the two modalities separately. The extracted features are fused by effective residual-based fusion modules.
arXiv Detail & Related papers (2021-06-21T10:47:26Z)
Deep Continuous Fusion for Multi-Sensor 3D Object Detection [103.5060007382646]
We propose a novel 3D object detector that can exploit both LIDAR as well as cameras to perform very accurate localization. We design an end-to-end learnable architecture that exploits continuous convolutions to fuse image and LIDAR feature maps at different levels of resolution.
arXiv Detail & Related papers (2020-12-20T18:43:41Z)
siaNMS: Non-Maximum Suppression with Siamese Networks for Multi-Camera 3D Object Detection [65.03384167873564]
A siamese network is integrated into the pipeline of a well-known 3D object detector approach. associations are exploited to enhance the 3D box regression of the object. The experimental evaluation on the nuScenes dataset shows that the proposed method outperforms traditional NMS approaches.
arXiv Detail & Related papers (2020-02-19T15:32:38Z)

This list is automatically generated from the titles and abstracts of the papers in this site.