Scan2Part: Fine-grained and Hierarchical Part-level Understanding of
Real-World 3D Scans
- URL: http://arxiv.org/abs/2206.02366v1
- Date: Mon, 6 Jun 2022 05:43:10 GMT
- Title: Scan2Part: Fine-grained and Hierarchical Part-level Understanding of
Real-World 3D Scans
- Authors: Alexandr Notchenko, Vladislav Ishimtsev, Alexey Artemov, Vadim
Selyutin, Emil Bogomolov, Evgeny Burnaev
- Abstract summary: We propose Scan2Part, a method to segment individual parts of objects in real-world, noisy indoor RGB-D scans.
We use a sparse U-Net-based architecture that captures the fine-scale detail of the underlying 3D scan geometry.
As output, we are able to predict fine-grained per-object part labels, even when the geometry is coarse or partially missing.
- Score: 68.98085986594411
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: We propose Scan2Part, a method to segment individual parts of objects in
real-world, noisy indoor RGB-D scans. To this end, we vary the part hierarchies
of objects in indoor scenes and explore their effect on scene understanding
models. Specifically, we use a sparse U-Net-based architecture that captures
the fine-scale detail of the underlying 3D scan geometry by leveraging a
multi-scale feature hierarchy. In order to train our method, we introduce the
Scan2Part dataset, which is the first large-scale collection providing detailed
semantic labels at the part level in the real-world setting. In total, we
provide 242,081 correspondences between 53,618 PartNet parts of 2,477 ShapeNet
objects and 1,506 ScanNet scenes, at two spatial resolutions of 2 cm$^3$ and 5
cm$^3$. As output, we are able to predict fine-grained per-object part labels,
even when the geometry is coarse or partially missing.
Related papers
- 3DCoMPaT200: Language-Grounded Compositional Understanding of Parts and Materials of 3D Shapes [29.8054021078428]
3DCoMPaT200 is a large-scale dataset tailored for compositional understanding of object parts and materials.
It features 200 object categories with $approx$5 times larger object vocabulary compared to 3DCoMPaT and $approx$ 4 times larger part categories.
To address the complexities of compositional 3D modeling, we propose a novel task of Compositional Part Shape Retrieval.
arXiv Detail & Related papers (2025-01-12T11:46:07Z) - 3D Part Segmentation via Geometric Aggregation of 2D Visual Features [57.20161517451834]
Supervised 3D part segmentation models are tailored for a fixed set of objects and parts, limiting their transferability to open-set, real-world scenarios.
Recent works have explored vision-language models (VLMs) as a promising alternative, using multi-view rendering and textual prompting to identify object parts.
To address these limitations, we propose COPS, a COmprehensive model for Parts that blends semantics extracted from visual concepts and 3D geometry to effectively identify object parts.
arXiv Detail & Related papers (2024-12-05T15:27:58Z) - 3D Small Object Detection with Dynamic Spatial Pruning [62.72638845817799]
We propose an efficient feature pruning strategy for 3D small object detection.
We present a multi-level 3D detector named DSPDet3D which benefits from high spatial resolution.
It takes less than 2s to directly process a whole building consisting of more than 4500k points while detecting out almost all objects.
arXiv Detail & Related papers (2023-05-05T17:57:04Z) - CAGroup3D: Class-Aware Grouping for 3D Object Detection on Point Clouds [55.44204039410225]
We present a novel two-stage fully sparse convolutional 3D object detection framework, named CAGroup3D.
Our proposed method first generates some high-quality 3D proposals by leveraging the class-aware local group strategy on the object surface voxels.
To recover the features of missed voxels due to incorrect voxel-wise segmentation, we build a fully sparse convolutional RoI pooling module.
arXiv Detail & Related papers (2022-10-09T13:38:48Z) - Neural Part Priors: Learning to Optimize Part-Based Object Completion in
RGB-D Scans [27.377128012679076]
We propose to leverage large-scale synthetic datasets of 3D shapes annotated with part information to learn Neural Part Priors.
We can optimize over the learned part priors in order to fit to real-world scanned 3D scenes at test time.
Experiments on the ScanNet dataset demonstrate that NPPs significantly outperforms state of the art in part decomposition and object completion.
arXiv Detail & Related papers (2022-03-17T15:05:44Z) - Discovering 3D Parts from Image Collections [98.16987919686709]
We tackle the problem of 3D part discovery from only 2D image collections.
Instead of relying on manually annotated parts for supervision, we propose a self-supervised approach.
Our key insight is to learn a novel part shape prior that allows each part to fit an object shape faithfully while constrained to have simple geometry.
arXiv Detail & Related papers (2021-07-28T20:29:16Z) - Learning Geometry-Disentangled Representation for Complementary
Understanding of 3D Object Point Cloud [50.56461318879761]
We propose Geometry-Disentangled Attention Network (GDANet) for 3D image processing.
GDANet disentangles point clouds into contour and flat part of 3D objects, respectively denoted by sharp and gentle variation components.
Experiments on 3D object classification and segmentation benchmarks demonstrate that GDANet achieves the state-of-the-arts with fewer parameters.
arXiv Detail & Related papers (2020-12-20T13:35:00Z) - Scan2Cap: Context-aware Dense Captioning in RGB-D Scans [10.688467522949082]
We introduce the task of dense captioning in 3D scans from commodity RGB-D sensors.
We propose Scan2Cap, an end-to-end trained method, to detect objects in the input scene and describe them in natural language.
Our method can effectively localize and describe 3D objects in scenes from the ScanRefer dataset.
arXiv Detail & Related papers (2020-12-03T19:00:05Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.