ImGeoNet: Image-induced Geometry-aware Voxel Representation for
Multi-view 3D Object Detection
- URL: http://arxiv.org/abs/2308.09098v1
- Date: Thu, 17 Aug 2023 16:49:38 GMT
- Title: ImGeoNet: Image-induced Geometry-aware Voxel Representation for
Multi-view 3D Object Detection
- Authors: Tao Tu, Shun-Po Chuang, Yu-Lun Liu, Cheng Sun, Ke Zhang, Donna Roy,
Cheng-Hao Kuo, Min Sun
- Abstract summary: ImGeoNet is an image-based 3D object detection framework that models a 3D space by an image-induced geometry-aware voxel representation.
We conduct experiments on three indoor datasets, namely ARKitScenes, ScanNetV2, and ScanNet200.
Our studies indicate that our proposed image-induced geometry-aware representation can enable image-based methods to attain superior detection accuracy.
- Score: 24.29296860815032
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: We propose ImGeoNet, a multi-view image-based 3D object detection framework
that models a 3D space by an image-induced geometry-aware voxel representation.
Unlike previous methods which aggregate 2D features into 3D voxels without
considering geometry, ImGeoNet learns to induce geometry from multi-view images
to alleviate the confusion arising from voxels of free space, and during the
inference phase, only images from multiple views are required. Besides, a
powerful pre-trained 2D feature extractor can be leveraged by our
representation, leading to a more robust performance. To evaluate the
effectiveness of ImGeoNet, we conduct quantitative and qualitative experiments
on three indoor datasets, namely ARKitScenes, ScanNetV2, and ScanNet200. The
results demonstrate that ImGeoNet outperforms the current state-of-the-art
multi-view image-based method, ImVoxelNet, on all three datasets in terms of
detection accuracy. In addition, ImGeoNet shows great data efficiency by
achieving results comparable to ImVoxelNet with 100 views while utilizing only
40 views. Furthermore, our studies indicate that our proposed image-induced
geometry-aware representation can enable image-based methods to attain superior
detection accuracy than the seminal point cloud-based method, VoteNet, in two
practical scenarios: (1) scenarios where point clouds are sparse and noisy,
such as in ARKitScenes, and (2) scenarios involve diverse object classes,
particularly classes of small objects, as in the case in ScanNet200.
Related papers
- 3DiffTection: 3D Object Detection with Geometry-Aware Diffusion Features [70.50665869806188]
3DiffTection is a state-of-the-art method for 3D object detection from single images.
We fine-tune a diffusion model to perform novel view synthesis conditioned on a single image.
We further train the model on target data with detection supervision.
arXiv Detail & Related papers (2023-11-07T23:46:41Z) - AutoDecoding Latent 3D Diffusion Models [95.7279510847827]
We present a novel approach to the generation of static and articulated 3D assets that has a 3D autodecoder at its core.
The 3D autodecoder framework embeds properties learned from the target dataset in the latent space.
We then identify the appropriate intermediate volumetric latent space, and introduce robust normalization and de-normalization operations.
arXiv Detail & Related papers (2023-07-07T17:59:14Z) - Flattening-Net: Deep Regular 2D Representation for 3D Point Cloud
Analysis [66.49788145564004]
We present an unsupervised deep neural architecture called Flattening-Net to represent irregular 3D point clouds of arbitrary geometry and topology.
Our methods perform favorably against the current state-of-the-art competitors.
arXiv Detail & Related papers (2022-12-17T15:05:25Z) - Scatter Points in Space: 3D Detection from Multi-view Monocular Images [8.71944437852952]
3D object detection from monocular image(s) is a challenging and long-standing problem of computer vision.
Recent methods tend to aggregate multiview feature by sampling regular 3D grid densely in space.
We propose a learnable keypoints sampling method, which scatters pseudo surface points in 3D space, in order to keep data sparsity.
arXiv Detail & Related papers (2022-08-31T09:38:05Z) - Graph-DETR3D: Rethinking Overlapping Regions for Multi-View 3D Object
Detection [17.526914782562528]
We propose Graph-DETR3D to automatically aggregate multi-view imagery information through graph structure learning (GSL)
Our best model achieves 49.5 NDS on the nuScenes test leaderboard, achieving new state-of-the-art in comparison with various published image-view 3D object detectors.
arXiv Detail & Related papers (2022-04-25T12:10:34Z) - ImVoxelNet: Image to Voxels Projection for Monocular and Multi-View
General-Purpose 3D Object Detection [3.330229314824913]
ImVoxelNet is a novel fully convolutional method of 3D object detection based on monocular or multi-view RGB images.
ImVoxelNet successfully handles both indoor and outdoor scenes.
It surpasses existing RGB-based 3D object detection methods on the SUN RGB-D dataset.
arXiv Detail & Related papers (2021-06-02T14:20:24Z) - Learning Geometry-Disentangled Representation for Complementary
Understanding of 3D Object Point Cloud [50.56461318879761]
We propose Geometry-Disentangled Attention Network (GDANet) for 3D image processing.
GDANet disentangles point clouds into contour and flat part of 3D objects, respectively denoted by sharp and gentle variation components.
Experiments on 3D object classification and segmentation benchmarks demonstrate that GDANet achieves the state-of-the-arts with fewer parameters.
arXiv Detail & Related papers (2020-12-20T13:35:00Z) - ParaNet: Deep Regular Representation for 3D Point Clouds [62.81379889095186]
ParaNet is a novel end-to-end deep learning framework for representing 3D point clouds.
It converts an irregular 3D point cloud into a regular 2D color image, named point geometry image (PGI)
In contrast to conventional regular representation modalities based on multi-view projection and voxelization, the proposed representation is differentiable and reversible.
arXiv Detail & Related papers (2020-12-05T13:19:55Z) - Mask2CAD: 3D Shape Prediction by Learning to Segment and Retrieve [54.054575408582565]
We propose to leverage existing large-scale datasets of 3D models to understand the underlying 3D structure of objects seen in an image.
We present Mask2CAD, which jointly detects objects in real-world images and for each detected object, optimize for the most similar CAD model and its pose.
This produces a clean, lightweight representation of the objects in an image.
arXiv Detail & Related papers (2020-07-26T00:08:37Z) - Object Detection on Single Monocular Images through Canonical
Correlation Analysis [3.4722706398428493]
We retrieve 3-D object information from single monocular images without using extra 3-D data like points cloud or depth images.
We propose a two-dimensional CCA framework to fuse monocular images and corresponding predicted depth images for basic computer vision tasks.
arXiv Detail & Related papers (2020-02-13T05:03:42Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.