IKEA-Manual: Seeing Shape Assembly Step by Step
- URL: http://arxiv.org/abs/2302.01881v1
- Date: Fri, 3 Feb 2023 17:32:22 GMT
- Title: IKEA-Manual: Seeing Shape Assembly Step by Step
- Authors: Ruocheng Wang, Yunzhi Zhang, Jiayuan Mao, Ran Zhang, Chin-Yi Cheng,
Jiajun Wu
- Abstract summary: We present IKEA-Manual, a dataset consisting of 102 IKEA objects paired with assembly manuals.
We provide fine-grained annotations on the IKEA objects and assembly manuals, including assembly parts, assembly plans, manual segmentation, and 2D-3D correspondence between 3D parts and visual manuals.
- Score: 26.79113677450921
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Human-designed visual manuals are crucial components in shape assembly
activities. They provide step-by-step guidance on how we should move and
connect different parts in a convenient and physically-realizable way. While
there has been an ongoing effort in building agents that perform assembly
tasks, the information in human-design manuals has been largely overlooked. We
identify that this is due to 1) a lack of realistic 3D assembly objects that
have paired manuals and 2) the difficulty of extracting structured information
from purely image-based manuals. Motivated by this observation, we present
IKEA-Manual, a dataset consisting of 102 IKEA objects paired with assembly
manuals. We provide fine-grained annotations on the IKEA objects and assembly
manuals, including decomposed assembly parts, assembly plans, manual
segmentation, and 2D-3D correspondence between 3D parts and visual manuals. We
illustrate the broad application of our dataset on four tasks related to shape
assembly: assembly plan generation, part segmentation, pose estimation, and 3D
part assembly.
Related papers
- IKEA Manuals at Work: 4D Grounding of Assembly Instructions on Internet Videos [34.67148665646724]
We introduce IKEA Video Manuals, a dataset that features 3D models of furniture parts, instructional manuals, assembly videos from the Internet, and most importantly, annotations of dense-temporal alignments between these data modalities.
We present five applications essential for shape assembly: assembly plan generation, part-conditioned segmentation, part-conditioned pose estimation, video object segmentation, and furniture assembly based on instructional video manuals.
arXiv Detail & Related papers (2024-11-18T09:30:05Z) - Neural Assembler: Learning to Generate Fine-Grained Robotic Assembly Instructions from Multi-View Images [24.10809783713574]
This paper introduces a novel task: translating multi-view images of a structural 3D model into a detailed sequence of assembly instructions.
We propose an end-to-end model known as the Neural Assembler.
arXiv Detail & Related papers (2024-04-25T08:53:23Z) - HOISDF: Constraining 3D Hand-Object Pose Estimation with Global Signed
Distance Fields [96.04424738803667]
HOISDF is a guided hand-object pose estimation network.
It exploits hand and object SDFs to provide a global, implicit representation over the complete reconstruction volume.
We show that HOISDF achieves state-of-the-art results on hand-object pose estimation benchmarks.
arXiv Detail & Related papers (2024-02-26T22:48:37Z) - Multi-3D-Models Registration-Based Augmented Reality (AR) Instructions
for Assembly [7.716174636585781]
BRICKxAR (M3D) visualizes rendered 3D assembly parts at the assembly location of the physical assembly model.
BRICKxAR (M3D) utilizes deep learning-trained 3D model-based registration.
arXiv Detail & Related papers (2023-11-27T21:53:17Z) - Aligning Step-by-Step Instructional Diagrams to Video Demonstrations [51.67930509196712]
We consider a novel setting where alignment is between (i) instruction steps that are depicted as assembly diagrams and (ii) video segments from in-the-wild videos.
We introduce a novel supervised contrastive learning method that learns to align videos with the subtle details in the assembly diagrams.
Experiments on IAW for Ikea assembly in the wild demonstrate superior performances of our approach against alternatives.
arXiv Detail & Related papers (2023-03-24T04:45:45Z) - Translating a Visual LEGO Manual to a Machine-Executable Plan [26.0127179598152]
We study the problem of translating an image-based, step-by-step assembly manual created by human designers into machine-interpretable instructions.
We present a novel learning-based framework, the Manual-to-Executable-Plan Network (MEPNet), which reconstructs the assembly steps from a sequence of manual images.
arXiv Detail & Related papers (2022-07-25T23:35:46Z) - Watch It Move: Unsupervised Discovery of 3D Joints for Re-Posing of
Articulated Objects [73.23249640099516]
We learn both the appearance and the structure of previously unseen articulated objects by observing them move from multiple views.
Our insight is that adjacent parts that move relative to each other must be connected by a joint.
We show that our method works for different structures, from quadrupeds, to single-arm robots, to humans.
arXiv Detail & Related papers (2021-12-21T16:37:48Z) - ArraMon: A Joint Navigation-Assembly Instruction Interpretation Task in
Dynamic Environments [85.81157224163876]
We combine Vision-and-Language Navigation, assembling of collected objects, and object referring expression comprehension, to create a novel joint navigation-and-assembly task, named ArraMon.
During this task, the agent is asked to find and collect different target objects one-by-one by navigating based on natural language instructions in a complex, realistic outdoor environment.
We present results for several baseline models (integrated and biased) and metrics (nDTW, CTC, rPOD, and PTC), and the large model-human performance gap demonstrates that our task is challenging and presents a wide scope for future work.
arXiv Detail & Related papers (2020-11-15T23:30:36Z) - The IKEA ASM Dataset: Understanding People Assembling Furniture through
Actions, Objects and Pose [108.21037046507483]
IKEA ASM is a three million frame, multi-view, furniture assembly video dataset that includes depth, atomic actions, object segmentation, and human pose.
We benchmark prominent methods for video action recognition, object segmentation and human pose estimation tasks on this challenging dataset.
The dataset enables the development of holistic methods, which integrate multi-modal and multi-view data to better perform on these tasks.
arXiv Detail & Related papers (2020-07-01T11:34:46Z) - Joint Hand-object 3D Reconstruction from a Single Image with
Cross-branch Feature Fusion [78.98074380040838]
We propose to consider hand and object jointly in feature space and explore the reciprocity of the two branches.
We employ an auxiliary depth estimation module to augment the input RGB image with the estimated depth map.
Our approach significantly outperforms existing approaches in terms of the reconstruction accuracy of objects.
arXiv Detail & Related papers (2020-06-28T09:50:25Z) - Learning 3D Part Assembly from a Single Image [20.175502864488493]
We introduce a novel problem, single-image-guided 3D part assembly, along with a learningbased solution.
We study this problem in the setting of furniture assembly from a given complete set of parts and a single image depicting the entire assembled object.
arXiv Detail & Related papers (2020-03-21T21:19:28Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.