Manual-PA: Learning 3D Part Assembly from Instruction Diagrams
- URL: http://arxiv.org/abs/2411.18011v1
- Date: Wed, 27 Nov 2024 03:10:29 GMT
- Title: Manual-PA: Learning 3D Part Assembly from Instruction Diagrams
- Authors: Jiahao Zhang, Anoop Cherian, Cristian Rodriguez, Weijian Deng, Stephen Gould,
- Abstract summary: We present Manual-PA, a transformer-based instruction Manual-guided 3D Part Assembly framework.
Our results show that using the diagrams and the order of the parts lead to significant improvements in assembly performance against the state of the art.
- Score: 54.555154845137906
- License:
- Abstract: Assembling furniture amounts to solving the discrete-continuous optimization task of selecting the furniture parts to assemble and estimating their connecting poses in a physically realistic manner. The problem is hampered by its combinatorially large yet sparse solution space thus making learning to assemble a challenging task for current machine learning models. In this paper, we attempt to solve this task by leveraging the assembly instructions provided in diagrammatic manuals that typically accompany the furniture parts. Our key insight is to use the cues in these diagrams to split the problem into discrete and continuous phases. Specifically, we present Manual-PA, a transformer-based instruction Manual-guided 3D Part Assembly framework that learns to semantically align 3D parts with their illustrations in the manuals using a contrastive learning backbone towards predicting the assembly order and infers the 6D pose of each part via relating it to the final furniture depicted in the manual. To validate the efficacy of our method, we conduct experiments on the benchmark PartNet dataset. Our results show that using the diagrams and the order of the parts lead to significant improvements in assembly performance against the state of the art. Further, Manual-PA demonstrates strong generalization to real-world IKEA furniture assembly on the IKEA-Manual dataset.
Related papers
- Manual2Skill: Learning to Read Manuals and Acquire Robotic Skills for Furniture Assembly Using Vision-Language Models [21.72355258499675]
We present Manual2Skill, a novel framework that enables robots to perform complex assembly tasks guided by high-level manual instructions.
Our approach leverages a Vision-Language Model (VLM) to extract structured information from instructional images.
We demonstrate the effectiveness of Manual2Skill by successfully assembling several real-world IKEA furniture items.
arXiv Detail & Related papers (2025-02-14T11:25:24Z) - IKEA Manuals at Work: 4D Grounding of Assembly Instructions on Internet Videos [34.67148665646724]
We introduce IKEA Video Manuals, a dataset that features 3D models of furniture parts, instructional manuals, assembly videos from the Internet, and most importantly, annotations of dense-temporal alignments between these data modalities.
We present five applications essential for shape assembly: assembly plan generation, part-conditioned segmentation, part-conditioned pose estimation, video object segmentation, and furniture assembly based on instructional video manuals.
arXiv Detail & Related papers (2024-11-18T09:30:05Z) - TreeSBA: Tree-Transformer for Self-Supervised Sequential Brick Assembly [51.29305265324916]
We propose a class-agnostic tree-transformer framework to predict the sequential assembly actions from input multi-view images.
A major challenge of the sequential brick assembly task is that the step-wise action labels are costly and tedious to obtain in practice.
We mitigate this problem by leveraging synthetic-to-real transfer learning.
arXiv Detail & Related papers (2024-07-22T14:05:27Z) - Enhancing Generalizability of Representation Learning for Data-Efficient 3D Scene Understanding [50.448520056844885]
We propose a generative Bayesian network to produce diverse synthetic scenes with real-world patterns.
A series of experiments robustly display our method's consistent superiority over existing state-of-the-art pre-training approaches.
arXiv Detail & Related papers (2024-06-17T07:43:53Z) - Aligning Step-by-Step Instructional Diagrams to Video Demonstrations [51.67930509196712]
We consider a novel setting where alignment is between (i) instruction steps that are depicted as assembly diagrams and (ii) video segments from in-the-wild videos.
We introduce a novel supervised contrastive learning method that learns to align videos with the subtle details in the assembly diagrams.
Experiments on IAW for Ikea assembly in the wild demonstrate superior performances of our approach against alternatives.
arXiv Detail & Related papers (2023-03-24T04:45:45Z) - IKEA-Manual: Seeing Shape Assembly Step by Step [26.79113677450921]
We present IKEA-Manual, a dataset consisting of 102 IKEA objects paired with assembly manuals.
We provide fine-grained annotations on the IKEA objects and assembly manuals, including assembly parts, assembly plans, manual segmentation, and 2D-3D correspondence between 3D parts and visual manuals.
arXiv Detail & Related papers (2023-02-03T17:32:22Z) - Translating a Visual LEGO Manual to a Machine-Executable Plan [26.0127179598152]
We study the problem of translating an image-based, step-by-step assembly manual created by human designers into machine-interpretable instructions.
We present a novel learning-based framework, the Manual-to-Executable-Plan Network (MEPNet), which reconstructs the assembly steps from a sequence of manual images.
arXiv Detail & Related papers (2022-07-25T23:35:46Z) - Towards unconstrained joint hand-object reconstruction from RGB videos [81.97694449736414]
Reconstructing hand-object manipulations holds a great potential for robotics and learning from human demonstrations.
We first propose a learning-free fitting approach for hand-object reconstruction which can seamlessly handle two-hand object interactions.
arXiv Detail & Related papers (2021-08-16T12:26:34Z) - Joint Hand-object 3D Reconstruction from a Single Image with
Cross-branch Feature Fusion [78.98074380040838]
We propose to consider hand and object jointly in feature space and explore the reciprocity of the two branches.
We employ an auxiliary depth estimation module to augment the input RGB image with the estimated depth map.
Our approach significantly outperforms existing approaches in terms of the reconstruction accuracy of objects.
arXiv Detail & Related papers (2020-06-28T09:50:25Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.