Related papers: MVSDet: Multi-View Indoor 3D Object Detection via Efficient Plane Sweeps

MVSDet: Multi-View Indoor 3D Object Detection via Efficient Plane Sweeps

URL: http://arxiv.org/abs/2410.21566v1
Date: Mon, 28 Oct 2024 21:58:41 GMT
Title: MVSDet: Multi-View Indoor 3D Object Detection via Efficient Plane Sweeps
Authors: Yating Xu, Chen Li, Gim Hee Lee,
Abstract summary: Key challenge of multi-view indoor 3D object detection is to infer accurate geometry information from images for precise 3D detection. Previous method relies on NeRF for geometry reasoning. We propose MVSDet which utilizes plane sweep for geometry-aware 3D object detection.
Score: 51.44887282336391
License: http://creativecommons.org/licenses/by/4.0/
Abstract: The key challenge of multi-view indoor 3D object detection is to infer accurate geometry information from images for precise 3D detection. Previous method relies on NeRF for geometry reasoning. However, the geometry extracted from NeRF is generally inaccurate, which leads to sub-optimal detection performance. In this paper, we propose MVSDet which utilizes plane sweep for geometry-aware 3D object detection. To circumvent the requirement for a large number of depth planes for accurate depth prediction, we design a probabilistic sampling and soft weighting mechanism to decide the placement of pixel features on the 3D volume. We select multiple locations that score top in the probability volume for each pixel and use their probability score to indicate the confidence. We further apply recent pixel-aligned Gaussian Splatting to regularize depth prediction and improve detection performance with little computation overhead. Extensive experiments on ScanNet and ARKitScenes datasets are conducted to show the superiority of our model. Our code is available at https://github.com/Pixie8888/MVSDet.

Related papers

3DGeoDet: General-purpose Geometry-aware Image-based 3D Object Detection [17.502554516157893]
3DGeoDet is a novel geometry-aware 3D object detection approach.<n>It effectively handles single- and multi-view RGB images in indoor and outdoor environments.
arXiv Detail & Related papers (2025-06-11T09:18:36Z)
GO-N3RDet: Geometry Optimized NeRF-enhanced 3D Object Detector [22.82676897012763]
GO-N3RDet is a scene-geometry optimized multi-view 3D object detector enhanced by neural radiance fields. We introduce a unique 3D positional information embedded voxel optimization mechanism to fuse multi-view features. Our unique modules synergetically form an end-to-end neural model that establishes new state-of-the-art in NeRF-based multi-view 3D detection.
arXiv Detail & Related papers (2025-03-19T13:51:00Z)
Sparse Points to Dense Clouds: Enhancing 3D Detection with Limited LiDAR Data [68.18735997052265]
We propose a balanced approach that combines the advantages of monocular and point cloud-based 3D detection. Our method requires only a small number of 3D points, that can be obtained from a low-cost, low-resolution sensor. The accuracy of 3D detection improves by 20% compared to the state-of-the-art monocular detection methods.
arXiv Detail & Related papers (2024-04-10T03:54:53Z)
NeRF-Det: Learning Geometry-Aware Volumetric Representation for Multi-View 3D Object Detection [65.02633277884911]
We present NeRF-Det, a novel method for indoor 3D detection with posed RGB images as input. Our method makes use of NeRF in an end-to-end manner to explicitly estimate 3D geometry, thereby improving 3D detection performance.
arXiv Detail & Related papers (2023-07-27T04:36:16Z)
3D Small Object Detection with Dynamic Spatial Pruning [62.72638845817799]
We propose an efficient feature pruning strategy for 3D small object detection. We present a multi-level 3D detector named DSPDet3D which benefits from high spatial resolution. It takes less than 2s to directly process a whole building consisting of more than 4500k points while detecting out almost all objects.
arXiv Detail & Related papers (2023-05-05T17:57:04Z)
3DPPE: 3D Point Positional Encoding for Multi-Camera 3D Object Detection Transformers [35.14784758217257]
We introduce 3D point positional encoding, 3DPPE, to the 3D detection Transformer decoder. Despite the approximation, 3DPPE achieves 46.0 mAP and 51.4 NDS on the competitive nuScenes dataset.
arXiv Detail & Related papers (2022-11-27T03:36:32Z)
DETR3D: 3D Object Detection from Multi-view Images via 3D-to-2D Queries [43.02373021724797]
We introduce a framework for multi-camera 3D object detection. Our method manipulates predictions directly in 3D space. We achieve state-of-the-art performance on the nuScenes autonomous driving benchmark.
arXiv Detail & Related papers (2021-10-13T17:59:35Z)
Soft Expectation and Deep Maximization for Image Feature Detection [68.8204255655161]
We propose SEDM, an iterative semi-supervised learning process that flips the question and first looks for repeatable 3D points, then trains a detector to localize them in image space. Our results show that this new model trained using SEDM is able to better localize the underlying 3D points in a scene.
arXiv Detail & Related papers (2021-04-21T00:35:32Z)
Ground-aware Monocular 3D Object Detection for Autonomous Driving [6.5702792909006735]
Estimating the 3D position and orientation of objects in the environment with a single RGB camera is a challenging task for low-cost urban autonomous driving and mobile robots. Most of the existing algorithms are based on the geometric constraints in 2D-3D correspondence, which stems from generic 6D object pose estimation. We introduce a novel neural network module to fully utilize such application-specific priors in the framework of deep learning.
arXiv Detail & Related papers (2021-02-01T08:18:24Z)
DSGN: Deep Stereo Geometry Network for 3D Object Detection [79.16397166985706]
There is a large performance gap between image-based and LiDAR-based 3D object detectors. Our method, called Deep Stereo Geometry Network (DSGN), significantly reduces this gap. For the first time, we provide a simple and effective one-stage stereo-based 3D detection pipeline.
arXiv Detail & Related papers (2020-01-10T11:44:37Z)

This list is automatically generated from the titles and abstracts of the papers in this site.