Related papers: NeRF-Det++: Incorporating Semantic Cues and Perspective-aware Depth Supervision for Indoor Multi-View 3D Detection

NeRF-Det++: Incorporating Semantic Cues and Perspective-aware Depth Supervision for Indoor Multi-View 3D Detection

URL: http://arxiv.org/abs/2402.14464v1
Date: Thu, 22 Feb 2024 11:48:06 GMT
Title: NeRF-Det++: Incorporating Semantic Cues and Perspective-aware Depth Supervision for Indoor Multi-View 3D Detection
Authors: Chenxi Huang and Yuenan Hou and Weicai Ye and Di Huang and Xiaoshui Huang and Binbin Lin and Deng Cai and Wanli Ouyang
Abstract summary: NeRF-Det has achieved impressive performance in indoor multi-view 3D detection by utilizing NeRF to enhance representation learning. We present three corresponding solutions, including semantic enhancement, perspective-aware sampling, and ordinal depth supervision. The resulting algorithm, NeRF-Det++, has exhibited appealing performance in the ScanNetV2 and AR KITScenes datasets.
Score: 72.0098999512727
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: NeRF-Det has achieved impressive performance in indoor multi-view 3D detection by innovatively utilizing NeRF to enhance representation learning. Despite its notable performance, we uncover three decisive shortcomings in its current design, including semantic ambiguity, inappropriate sampling, and insufficient utilization of depth supervision. To combat the aforementioned problems, we present three corresponding solutions: 1) Semantic Enhancement. We project the freely available 3D segmentation annotations onto the 2D plane and leverage the corresponding 2D semantic maps as the supervision signal, significantly enhancing the semantic awareness of multi-view detectors. 2) Perspective-aware Sampling. Instead of employing the uniform sampling strategy, we put forward the perspective-aware sampling policy that samples densely near the camera while sparsely in the distance, more effectively collecting the valuable geometric clues. 3)Ordinal Residual Depth Supervision. As opposed to directly regressing the depth values that are difficult to optimize, we divide the depth range of each scene into a fixed number of ordinal bins and reformulate the depth prediction as the combination of the classification of depth bins as well as the regression of the residual depth values, thereby benefiting the depth learning process. The resulting algorithm, NeRF-Det++, has exhibited appealing performance in the ScanNetV2 and ARKITScenes datasets. Notably, in ScanNetV2, NeRF-Det++ outperforms the competitive NeRF-Det by +1.9% in mAP@0.25 and +3.5% in mAP@0.50$. The code will be publicly at https://github.com/mrsempress/NeRF-Detplusplus.

Related papers

NeRF-DetS: Enhancing Multi-View 3D Object Detection with Sampling-adaptive Network of Continuous NeRF-based Representation [60.47114985993196]
NeRF-Det unifies the tasks of novel view arithmetic and 3D perception. We introduce a novel 3D perception network structure, NeRF-DetS. NeRF-DetS outperforms competitive NeRF-Det on the ScanNetV2 dataset.
arXiv Detail & Related papers (2024-04-22T06:59:03Z)
Ray Denoising: Depth-aware Hard Negative Sampling for Multi-view 3D Object Detection [46.041193889845474]
Ray Denoising is an innovative method that enhances detection accuracy by strategically sampling along camera rays to construct hard negative examples. Ray Denoising is designed as a plug-and-play module, compatible with any DETR-style multi-view 3D detectors. It achieves a 1.9% improvement in mean Average Precision (mAP) over the state-of-the-art StreamPETR method on the NuScenes dataset.
arXiv Detail & Related papers (2024-02-06T02:17:44Z)
NeRF-Det: Learning Geometry-Aware Volumetric Representation for Multi-View 3D Object Detection [65.02633277884911]
We present NeRF-Det, a novel method for indoor 3D detection with posed RGB images as input. Our method makes use of NeRF in an end-to-end manner to explicitly estimate 3D geometry, thereby improving 3D detection performance.
arXiv Detail & Related papers (2023-07-27T04:36:16Z)
DARF: Depth-Aware Generalizable Neural Radiance Field [51.29437249009986]
We propose the Depth-Aware Generalizable Neural Radiance Field (DARF) with a Depth-Aware Dynamic Sampling (DADS) strategy. Our framework infers the unseen scenes on both pixel level and geometry level with only a few input images. Compared with state-of-the-art generalizable NeRF methods, DARF reduces samples by 50%, while improving rendering quality and depth estimation.
arXiv Detail & Related papers (2022-12-05T14:00:59Z)
OPA-3D: Occlusion-Aware Pixel-Wise Aggregation for Monocular 3D Object Detection [51.153003057515754]
OPA-3D is a single-stage, end-to-end, Occlusion-Aware Pixel-Wise Aggregation network. It jointly estimates dense scene depth with depth-bounding box residuals and object bounding boxes. It outperforms state-of-the-art methods on the main Car category.
arXiv Detail & Related papers (2022-11-02T14:19:13Z)
Multi-initialization Optimization Network for Accurate 3D Human Pose and Shape Estimation [75.44912541912252]
We propose a three-stage framework named Multi-Initialization Optimization Network (MION) In the first stage, we strategically select different coarse 3D reconstruction candidates which are compatible with the 2D keypoints of input sample. In the second stage, we design a mesh refinement transformer (MRT) to respectively refine each coarse reconstruction result via a self-attention mechanism. Finally, a Consistency Estimation Network (CEN) is proposed to find the best result from mutiple candidates by evaluating if the visual evidence in RGB image matches a given 3D reconstruction.
arXiv Detail & Related papers (2021-12-24T02:43:58Z)
Sparse Depth Completion with Semantic Mesh Deformation Optimization [4.03103540543081]
We propose a neural network with post-optimization, which takes an RGB image and sparse depth samples as input and predicts the complete depth map. Our evaluation results outperform the existing work consistently on both indoor and outdoor datasets.
arXiv Detail & Related papers (2021-12-10T13:01:06Z)
VR3Dense: Voxel Representation Learning for 3D Object Detection and Monocular Dense Depth Reconstruction [0.951828574518325]
We introduce a method for jointly training 3D object detection and monocular dense depth reconstruction neural networks. It takes as inputs, a LiDAR point-cloud, and a single RGB image during inference and produces object pose predictions as well as a densely reconstructed depth map. While our object detection is trained in a supervised manner, the depth prediction network is trained with both self-supervised and supervised loss functions.
arXiv Detail & Related papers (2021-04-13T04:25:54Z)
DELTAS: Depth Estimation by Learning Triangulation And densification of Sparse points [14.254472131009653]
Multi-view stereo (MVS) is the golden mean between the accuracy of active depth sensing and the practicality of monocular depth estimation. Cost volume based approaches employing 3D convolutional neural networks (CNNs) have considerably improved the accuracy of MVS systems. We propose an efficient depth estimation approach by first (a) detecting and evaluating descriptors for interest points, then (b) learning to match and triangulate a small set of interest points, and finally (c) densifying this sparse set of 3D points using CNNs.
arXiv Detail & Related papers (2020-03-19T17:56:41Z)
ZoomNet: Part-Aware Adaptive Zooming Neural Network for 3D Object Detection [69.68263074432224]
We present a novel framework named ZoomNet for stereo imagery-based 3D detection. The pipeline of ZoomNet begins with an ordinary 2D object detection model which is used to obtain pairs of left-right bounding boxes. To further exploit the abundant texture cues in RGB images for more accurate disparity estimation, we introduce a conceptually straight-forward module -- adaptive zooming.
arXiv Detail & Related papers (2020-03-01T17:18:08Z)

This list is automatically generated from the titles and abstracts of the papers in this site.

This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.