Structure from Action: Learning Interactions for Articulated Object 3D
Structure Discovery
- URL: http://arxiv.org/abs/2207.08997v2
- Date: Fri, 7 Apr 2023 16:49:33 GMT
- Title: Structure from Action: Learning Interactions for Articulated Object 3D
Structure Discovery
- Authors: Neil Nie, Samir Yitzhak Gadre, Kiana Ehsani, Shuran Song
- Abstract summary: We introduce Structure from Action (SfA), a framework to discover 3D part geometry and joint parameters of unseen articulated objects.
By selecting informative interactions, SfA discovers parts and reveals occluded surfaces, like the inside of a closed drawer.
Empirically, SfA outperforms a pipeline of state-of-the-art components by 25.4 3D IoU percentage points on unseen categories.
- Score: 18.96346371296251
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: We introduce Structure from Action (SfA), a framework to discover 3D part
geometry and joint parameters of unseen articulated objects via a sequence of
inferred interactions. Our key insight is that 3D interaction and perception
should be considered in conjunction to construct 3D articulated CAD models,
especially for categories not seen during training. By selecting informative
interactions, SfA discovers parts and reveals occluded surfaces, like the
inside of a closed drawer. By aggregating visual observations in 3D, SfA
accurately segments multiple parts, reconstructs part geometry, and infers all
joint parameters in a canonical coordinate frame. Our experiments demonstrate
that a SfA model trained in simulation can generalize to many unseen object
categories with diverse structures and to real-world objects. Empirically, SfA
outperforms a pipeline of state-of-the-art components by 25.4 3D IoU percentage
points on unseen categories, while matching already performant joint estimation
baselines.
Related papers
- 3DCoMPaT200: Language-Grounded Compositional Understanding of Parts and Materials of 3D Shapes [29.8054021078428]
3DCoMPaT200 is a large-scale dataset tailored for compositional understanding of object parts and materials.
It features 200 object categories with $approx$5 times larger object vocabulary compared to 3DCoMPaT and $approx$ 4 times larger part categories.
To address the complexities of compositional 3D modeling, we propose a novel task of Compositional Part Shape Retrieval.
arXiv Detail & Related papers (2025-01-12T11:46:07Z) - 3D Part Segmentation via Geometric Aggregation of 2D Visual Features [57.20161517451834]
Supervised 3D part segmentation models are tailored for a fixed set of objects and parts, limiting their transferability to open-set, real-world scenarios.
Recent works have explored vision-language models (VLMs) as a promising alternative, using multi-view rendering and textual prompting to identify object parts.
To address these limitations, we propose COPS, a COmprehensive model for Parts that blends semantics extracted from visual concepts and 3D geometry to effectively identify object parts.
arXiv Detail & Related papers (2024-12-05T15:27:58Z) - GREAT: Geometry-Intention Collaborative Inference for Open-Vocabulary 3D Object Affordance Grounding [53.42728468191711]
Open-Vocabulary 3D object affordance grounding aims to anticipate action possibilities'' regions on 3D objects with arbitrary instructions.
We propose GREAT (GeometRy-intEntion collAboraTive inference) for Open-Vocabulary 3D Object Affordance Grounding.
arXiv Detail & Related papers (2024-11-29T11:23:15Z) - Occupancy Planes for Single-view RGB-D Human Reconstruction [120.5818162569105]
Single-view RGB-D human reconstruction with implicit functions is often formulated as per-point classification.
We propose the occupancy planes (OPlanes) representation, which enables to formulate single-view RGB-D human reconstruction as occupancy prediction on planes which slice through the camera's view frustum.
arXiv Detail & Related papers (2022-08-04T17:59:56Z) - Neural Part Priors: Learning to Optimize Part-Based Object Completion in
RGB-D Scans [27.377128012679076]
We propose to leverage large-scale synthetic datasets of 3D shapes annotated with part information to learn Neural Part Priors.
We can optimize over the learned part priors in order to fit to real-world scanned 3D scenes at test time.
Experiments on the ScanNet dataset demonstrate that NPPs significantly outperforms state of the art in part decomposition and object completion.
arXiv Detail & Related papers (2022-03-17T15:05:44Z) - DensePose 3D: Lifting Canonical Surface Maps of Articulated Objects to
the Third Dimension [71.71234436165255]
We contribute DensePose 3D, a method that can learn such reconstructions in a weakly supervised fashion from 2D image annotations only.
Because it does not require 3D scans, DensePose 3D can be used for learning a wide range of articulated categories such as different animal species.
We show significant improvements compared to state-of-the-art non-rigid structure-from-motion baselines on both synthetic and real data on categories of humans and animals.
arXiv Detail & Related papers (2021-08-31T18:33:55Z) - VAT-Mart: Learning Visual Action Trajectory Proposals for Manipulating
3D ARTiculated Objects [19.296344218177534]
The space of 3D articulated objects is exceptionally rich in their myriad semantic categories, diverse shape geometry, and complicated part functionality.
Previous works mostly abstract kinematic structure with estimated joint parameters and part poses as the visual representations for manipulating 3D articulated objects.
We propose object-centric actionable visual priors as a novel perception-interaction handshaking point that the perception system outputs more actionable guidance than kinematic structure estimation.
arXiv Detail & Related papers (2021-06-28T07:47:31Z) - Improving Point Cloud Semantic Segmentation by Learning 3D Object
Detection [102.62963605429508]
Point cloud semantic segmentation plays an essential role in autonomous driving.
Current 3D semantic segmentation networks focus on convolutional architectures that perform great for well represented classes.
We propose a novel Aware 3D Semantic Detection (DASS) framework that explicitly leverages localization features from an auxiliary 3D object detection task.
arXiv Detail & Related papers (2020-09-22T14:17:40Z) - Fine-Grained 3D Shape Classification with Hierarchical Part-View
Attentions [70.0171362989609]
We propose a novel fine-grained 3D shape classification method named FG3D-Net to capture the fine-grained local details of 3D shapes from multiple rendered views.
Our results under the fine-grained 3D shape dataset show that our method outperforms other state-of-the-art methods.
arXiv Detail & Related papers (2020-05-26T06:53:19Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.