IKEA-Manual: Seeing Shape Assembly Step by Step
- URL: http://arxiv.org/abs/2302.01881v1
- Date: Fri, 3 Feb 2023 17:32:22 GMT
- Title: IKEA-Manual: Seeing Shape Assembly Step by Step
- Authors: Ruocheng Wang, Yunzhi Zhang, Jiayuan Mao, Ran Zhang, Chin-Yi Cheng,
Jiajun Wu
- Abstract summary: We present IKEA-Manual, a dataset consisting of 102 IKEA objects paired with assembly manuals.
We provide fine-grained annotations on the IKEA objects and assembly manuals, including assembly parts, assembly plans, manual segmentation, and 2D-3D correspondence between 3D parts and visual manuals.
- Score: 26.79113677450921
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Human-designed visual manuals are crucial components in shape assembly
activities. They provide step-by-step guidance on how we should move and
connect different parts in a convenient and physically-realizable way. While
there has been an ongoing effort in building agents that perform assembly
tasks, the information in human-design manuals has been largely overlooked. We
identify that this is due to 1) a lack of realistic 3D assembly objects that
have paired manuals and 2) the difficulty of extracting structured information
from purely image-based manuals. Motivated by this observation, we present
IKEA-Manual, a dataset consisting of 102 IKEA objects paired with assembly
manuals. We provide fine-grained annotations on the IKEA objects and assembly
manuals, including decomposed assembly parts, assembly plans, manual
segmentation, and 2D-3D correspondence between 3D parts and visual manuals. We
illustrate the broad application of our dataset on four tasks related to shape
assembly: assembly plan generation, part segmentation, pose estimation, and 3D
part assembly.
Related papers
- BimArt: A Unified Approach for the Synthesis of 3D Bimanual Interaction with Articulated Objects [70.20706475051347]
BimArt is a novel generative approach for synthesizing 3D bimanual hand interactions with articulated objects.
We first generate distance-based contact maps conditioned on the object trajectory with an articulation-aware feature representation.
The learned contact prior is then used to guide our hand motion generator, producing diverse and realistic bimanual motions for object movement and articulation.
arXiv Detail & Related papers (2024-12-06T14:23:56Z) - Manual-PA: Learning 3D Part Assembly from Instruction Diagrams [54.555154845137906]
We present Manual-PA, a transformer-based instruction Manual-guided 3D Part Assembly framework.
Our results show that using the diagrams and the order of the parts lead to significant improvements in assembly performance against the state of the art.
arXiv Detail & Related papers (2024-11-27T03:10:29Z) - IKEA Manuals at Work: 4D Grounding of Assembly Instructions on Internet Videos [34.67148665646724]
We introduce IKEA Video Manuals, a dataset that features 3D models of furniture parts, instructional manuals, assembly videos from the Internet, and most importantly, annotations of dense-temporal alignments between these data modalities.
We present five applications essential for shape assembly: assembly plan generation, part-conditioned segmentation, part-conditioned pose estimation, video object segmentation, and furniture assembly based on instructional video manuals.
arXiv Detail & Related papers (2024-11-18T09:30:05Z) - HOISDF: Constraining 3D Hand-Object Pose Estimation with Global Signed
Distance Fields [96.04424738803667]
HOISDF is a guided hand-object pose estimation network.
It exploits hand and object SDFs to provide a global, implicit representation over the complete reconstruction volume.
We show that HOISDF achieves state-of-the-art results on hand-object pose estimation benchmarks.
arXiv Detail & Related papers (2024-02-26T22:48:37Z) - Multi-3D-Models Registration-Based Augmented Reality (AR) Instructions
for Assembly [7.716174636585781]
BRICKxAR (M3D) visualizes rendered 3D assembly parts at the assembly location of the physical assembly model.
BRICKxAR (M3D) utilizes deep learning-trained 3D model-based registration.
arXiv Detail & Related papers (2023-11-27T21:53:17Z) - Aligning Step-by-Step Instructional Diagrams to Video Demonstrations [51.67930509196712]
We consider a novel setting where alignment is between (i) instruction steps that are depicted as assembly diagrams and (ii) video segments from in-the-wild videos.
We introduce a novel supervised contrastive learning method that learns to align videos with the subtle details in the assembly diagrams.
Experiments on IAW for Ikea assembly in the wild demonstrate superior performances of our approach against alternatives.
arXiv Detail & Related papers (2023-03-24T04:45:45Z) - Translating a Visual LEGO Manual to a Machine-Executable Plan [26.0127179598152]
We study the problem of translating an image-based, step-by-step assembly manual created by human designers into machine-interpretable instructions.
We present a novel learning-based framework, the Manual-to-Executable-Plan Network (MEPNet), which reconstructs the assembly steps from a sequence of manual images.
arXiv Detail & Related papers (2022-07-25T23:35:46Z) - Watch It Move: Unsupervised Discovery of 3D Joints for Re-Posing of
Articulated Objects [73.23249640099516]
We learn both the appearance and the structure of previously unseen articulated objects by observing them move from multiple views.
Our insight is that adjacent parts that move relative to each other must be connected by a joint.
We show that our method works for different structures, from quadrupeds, to single-arm robots, to humans.
arXiv Detail & Related papers (2021-12-21T16:37:48Z) - The IKEA ASM Dataset: Understanding People Assembling Furniture through
Actions, Objects and Pose [108.21037046507483]
IKEA ASM is a three million frame, multi-view, furniture assembly video dataset that includes depth, atomic actions, object segmentation, and human pose.
We benchmark prominent methods for video action recognition, object segmentation and human pose estimation tasks on this challenging dataset.
The dataset enables the development of holistic methods, which integrate multi-modal and multi-view data to better perform on these tasks.
arXiv Detail & Related papers (2020-07-01T11:34:46Z) - Learning 3D Part Assembly from a Single Image [20.175502864488493]
We introduce a novel problem, single-image-guided 3D part assembly, along with a learningbased solution.
We study this problem in the setting of furniture assembly from a given complete set of parts and a single image depicting the entire assembled object.
arXiv Detail & Related papers (2020-03-21T21:19:28Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.