EPNet: Enhancing Point Features with Image Semantics for 3D Object
Detection
- URL: http://arxiv.org/abs/2007.08856v1
- Date: Fri, 17 Jul 2020 09:33:05 GMT
- Title: EPNet: Enhancing Point Features with Image Semantics for 3D Object
Detection
- Authors: Tengteng Huang, Zhe Liu, Xiwu Chen and Xiang Bai
- Abstract summary: We aim at addressing two critical issues in the 3D detection task, including the exploitation of multiple sensors.
We propose a novel fusion module to enhance the point features with semantic image features in a point-wise manner without any image annotations.
We design an end-to-end learnable framework named EPNet to integrate these two components.
- Score: 60.097873683615695
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: In this paper, we aim at addressing two critical issues in the 3D detection
task, including the exploitation of multiple sensors~(namely LiDAR point cloud
and camera image), as well as the inconsistency between the localization and
classification confidence. To this end, we propose a novel fusion module to
enhance the point features with semantic image features in a point-wise manner
without any image annotations. Besides, a consistency enforcing loss is
employed to explicitly encourage the consistency of both the localization and
classification confidence. We design an end-to-end learnable framework named
EPNet to integrate these two components. Extensive experiments on the KITTI and
SUN-RGBD datasets demonstrate the superiority of EPNet over the
state-of-the-art methods. Codes and models are available at:
\url{https://github.com/happinesslz/EPNet}.
Related papers
- EPNet++: Cascade Bi-directional Fusion for Multi-Modal 3D Object
Detection [56.03081616213012]
We propose EPNet++ for multi-modal 3D object detection by introducing a novel Cascade Bi-directional Fusion(CB-Fusion) module.
The proposed CB-Fusion module boosts the plentiful semantic information of point features with the image features in a cascade bi-directional interaction fusion manner.
The experiment results on the KITTI, JRDB and SUN-RGBD datasets demonstrate the superiority of EPNet++ over the state-of-the-art methods.
arXiv Detail & Related papers (2021-12-21T10:48:34Z) - Similarity-Aware Fusion Network for 3D Semantic Segmentation [87.51314162700315]
We propose a similarity-aware fusion network (SAFNet) to adaptively fuse 2D images and 3D point clouds for 3D semantic segmentation.
We employ a late fusion strategy where we first learn the geometric and contextual similarities between the input and back-projected (from 2D pixels) point clouds.
We show that SAFNet significantly outperforms existing state-of-the-art fusion-based approaches across various data integrity.
arXiv Detail & Related papers (2021-07-04T09:28:18Z) - Segmenting 3D Hybrid Scenes via Zero-Shot Learning [13.161136148641813]
This work is to tackle the problem of point cloud semantic segmentation for 3D hybrid scenes under the framework of zero-shot learning.
We propose a network to synthesize point features for various classes of objects by leveraging the semantic features of both seen and unseen object classes, called PFNet.
The proposed PFNet employs a GAN architecture to synthesize point features, where the semantic relationship between seen-class and unseen-class features is consolidated by adapting a new semantic regularizer.
We introduce two benchmarks for algorithmic evaluation by re-organizing the public S3DIS and ScanNet datasets under six different data splits.
arXiv Detail & Related papers (2021-07-01T13:21:49Z) - P2-Net: Joint Description and Detection of Local Features for Pixel and
Point Matching [78.18641868402901]
This work takes the initiative to establish fine-grained correspondences between 2D images and 3D point clouds.
An ultra-wide reception mechanism in combination with a novel loss function are designed to mitigate the intrinsic information variations between pixel and point local regions.
arXiv Detail & Related papers (2021-03-01T14:59:40Z) - Deep Continuous Fusion for Multi-Sensor 3D Object Detection [103.5060007382646]
We propose a novel 3D object detector that can exploit both LIDAR as well as cameras to perform very accurate localization.
We design an end-to-end learnable architecture that exploits continuous convolutions to fuse image and LIDAR feature maps at different levels of resolution.
arXiv Detail & Related papers (2020-12-20T18:43:41Z) - Cross-Modality 3D Object Detection [63.29935886648709]
We present a novel two-stage multi-modal fusion network for 3D object detection.
The whole architecture facilitates two-stage fusion.
Our experiments on the KITTI dataset show that the proposed multi-stage fusion helps the network to learn better representations.
arXiv Detail & Related papers (2020-08-16T11:01:20Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.