VPFNet: Voxel-Pixel Fusion Network for Multi-class 3D Object Detection
- URL: http://arxiv.org/abs/2111.00966v1
- Date: Mon, 1 Nov 2021 14:17:09 GMT
- Title: VPFNet: Voxel-Pixel Fusion Network for Multi-class 3D Object Detection
- Authors: Chia-Hung Wang, Hsueh-Wei Chen, Li-Chen Fu
- Abstract summary: This paper proposes a fusion-based 3D object detection network, named Voxel-Pixel Fusion Network (VPFNet)
The proposed method is evaluated on the KITTI benchmark for multi-class 3D object detection task under multilevel difficulty.
It is shown to outperform all state-of-the-art methods in mean average precision (mAP)
- Score: 5.12292602924464
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Many LiDAR-based methods for detecting large objects, single-class object
detection, or under easy situations were claimed to perform quite well.
However, their performances of detecting small objects or under hard situations
did not surpass those of the fusion-based ones due to failure to leverage the
image semantics. In order to elevate the detection performance in a complicated
environment, this paper proposes a deep learning (DL)-embedded fusion-based
multi-class 3D object detection network which admits both LiDAR and camera
sensor data streams, named Voxel-Pixel Fusion Network (VPFNet). Inside this
network, a key novel component is called Voxel-Pixel Fusion (VPF) layer, which
takes advantage of the geometric relation of a voxel-pixel pair and fuses the
voxel features and the pixel features with proper mechanisms. Moreover, several
parameters are particularly designed to guide and enhance the fusion effect
after considering the characteristics of a voxel-pixel pair. Finally, the
proposed method is evaluated on the KITTI benchmark for multi-class 3D object
detection task under multilevel difficulty, and is shown to outperform all
state-of-the-art methods in mean average precision (mAP). It is also noteworthy
that our approach here ranks the first on the KITTI leaderboard for the
challenging pedestrian class.
Related papers
- PVAFN: Point-Voxel Attention Fusion Network with Multi-Pooling Enhancing for 3D Object Detection [59.355022416218624]
integration of point and voxel representations is becoming more common in LiDAR-based 3D object detection.
We propose a novel two-stage 3D object detector, called Point-Voxel Attention Fusion Network (PVAFN)
PVAFN uses a multi-pooling strategy to integrate both multi-scale and region-specific information effectively.
arXiv Detail & Related papers (2024-08-26T19:43:01Z) - VoxelNextFusion: A Simple, Unified and Effective Voxel Fusion Framework
for Multi-Modal 3D Object Detection [33.46363259200292]
Existing voxel-based methods face challenges when fusing sparse voxel features with dense image features in a one-to-one manner.
We present VoxelNextFusion, a multi-modal 3D object detection framework specifically designed for voxel-based methods.
arXiv Detail & Related papers (2024-01-05T08:10:49Z) - VirtualPainting: Addressing Sparsity with Virtual Points and
Distance-Aware Data Augmentation for 3D Object Detection [3.5259183508202976]
We present an innovative approach that involves the generation of virtual LiDAR points using camera images.
We also enhance these virtual points with semantic labels obtained from image-based segmentation networks.
Our approach offers a versatile solution that can be seamlessly integrated into various 3D frameworks and 2D semantic segmentation methods.
arXiv Detail & Related papers (2023-12-26T18:03:05Z) - FusionViT: Hierarchical 3D Object Detection via LiDAR-Camera Vision
Transformer Fusion [8.168523242105763]
We will introduce a novel vision transformer-based 3D object detection model, namely FusionViT.
Our FusionViT model can achieve state-of-the-art performance and outperforms existing baseline methods.
arXiv Detail & Related papers (2023-11-07T00:12:01Z) - MLF-DET: Multi-Level Fusion for Cross-Modal 3D Object Detection [54.52102265418295]
We propose a novel and effective Multi-Level Fusion network, named as MLF-DET, for high-performance cross-modal 3D object DETection.
For the feature-level fusion, we present the Multi-scale Voxel Image fusion (MVI) module, which densely aligns multi-scale voxel features with image features.
For the decision-level fusion, we propose the lightweight Feature-cued Confidence Rectification (FCR) module, which exploits image semantics to rectify the confidence of detection candidates.
arXiv Detail & Related papers (2023-07-18T11:26:02Z) - Multi-View Photometric Stereo Revisited [100.97116470055273]
Multi-view photometric stereo (MVPS) is a preferred method for detailed and precise 3D acquisition of an object from images.
We present a simple, practical approach to MVPS, which works well for isotropic as well as other object material types such as anisotropic and glossy.
The proposed approach shows state-of-the-art results when tested extensively on several benchmark datasets.
arXiv Detail & Related papers (2022-10-14T09:46:15Z) - Paint and Distill: Boosting 3D Object Detection with Semantic Passing
Network [70.53093934205057]
3D object detection task from lidar or camera sensors is essential for autonomous driving.
We propose a novel semantic passing framework, named SPNet, to boost the performance of existing lidar-based 3D detection models.
arXiv Detail & Related papers (2022-07-12T12:35:34Z) - DeepFusion: Lidar-Camera Deep Fusion for Multi-Modal 3D Object Detection [83.18142309597984]
Lidars and cameras are critical sensors that provide complementary information for 3D detection in autonomous driving.
We develop a family of generic multi-modal 3D detection models named DeepFusion, which is more accurate than previous methods.
arXiv Detail & Related papers (2022-03-15T18:46:06Z) - Deep Continuous Fusion for Multi-Sensor 3D Object Detection [103.5060007382646]
We propose a novel 3D object detector that can exploit both LIDAR as well as cameras to perform very accurate localization.
We design an end-to-end learnable architecture that exploits continuous convolutions to fuse image and LIDAR feature maps at different levels of resolution.
arXiv Detail & Related papers (2020-12-20T18:43:41Z) - Multi-View Adaptive Fusion Network for 3D Object Detection [14.506796247331584]
3D object detection based on LiDAR-camera fusion is becoming an emerging research theme for autonomous driving.
We propose a single-stage multi-view fusion framework that takes LiDAR bird's-eye view, LiDAR range view and camera view images as inputs for 3D object detection.
We design an end-to-end learnable network named MVAF-Net to integrate these two components.
arXiv Detail & Related papers (2020-11-02T00:06:01Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.