Augmented Assembly: Object Recognition and Hand Tracking for Adaptive Assembly Instructions in Augmented Reality
- URL: http://arxiv.org/abs/2601.11535v1
- Date: Sat, 22 Nov 2025 22:49:40 GMT
- Title: Augmented Assembly: Object Recognition and Hand Tracking for Adaptive Assembly Instructions in Augmented Reality
- Authors: Alexander Htet Kyaw, Haotian Ma, Sasa Zivkovic, Jenny Sabin,
- Abstract summary: We present an AR-assisted assembly workflow that leverages object recognition and hand tracking.<n>Using object recognition, the system detects and localizes components in real time to create a digital twin of the workspace.<n>A case study with LEGO blocks and custom 3D-printed components demonstrates how the system links digital instructions to physical assembly.
- Score: 40.836596733334254
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Recent advances in augmented reality (AR) have enabled interactive systems that assist users in physical assembly tasks. In this paper, we present an AR-assisted assembly workflow that leverages object recognition and hand tracking to (1) identify custom components, (2) display step-by-step instructions, (3) detect assembly deviations, and (4) dynamically update the instructions based on users' hands-on interactions with physical parts. Using object recognition, the system detects and localizes components in real time to create a digital twin of the workspace. For each assembly step, it overlays bounding boxes in AR to indicate both the current position and the target placement of relevant components, while hand-tracking data verifies whether the user interacts with the correct part. Rather than enforcing a fixed sequence, the system highlights potential assembly errors and interprets user deviations as opportunities for iteration and creative exploration. A case study with LEGO blocks and custom 3D-printed components demonstrates how the system links digital instructions to physical assembly, eliminating the need for manual searching, sorting, or labeling of parts.
Related papers
- AI Assisted AR Assembly: Object Recognition and Computer Vision for Augmented Reality Assisted Assembly [40.836596733334254]
We present an AI-assisted Augmented Reality assembly workflow that uses deep learning-based object recognition.<n>For each assembly step, the system displays a bounding box around the corresponding components in the physical space, and where the component should be placed.
arXiv Detail & Related papers (2025-11-07T16:20:53Z) - Manual2Skill++: Connector-Aware General Robotic Assembly from Instruction Manuals via Vision-Language Models [26.61083683414806]
We consider connections as first-class primitives in assembly representation, including connector types, specifications, quantities, and placement locations.<n>We present Manual2Skill++, a vision-language framework that automatically extracts structured connection information from assembly manuals.<n>A large-scale vision-language model parses symbolic diagrams and annotations in manuals to instantiate these graphs, leveraging the rich connection knowledge embedded in human-designed instructions.
arXiv Detail & Related papers (2025-10-18T04:13:26Z) - EgoPrompt: Prompt Learning for Egocentric Action Recognition [49.12318087940015]
EgoPrompt is a prompt learning-based framework to conduct egocentric action recognition task.<n>EgoPrompt achieves state-of-the-art performance across within-dataset, cross-dataset, and base-to-novel generalization benchmarks.
arXiv Detail & Related papers (2025-08-05T09:47:07Z) - IAAO: Interactive Affordance Learning for Articulated Objects in 3D Environments [56.85804719947]
We present IAAO, a framework that builds an explicit 3D model for intelligent agents to gain understanding of articulated objects in their environment through interaction.<n>We first build hierarchical features and label fields for each object state using 3D Gaussian Splatting (3DGS) by distilling mask features and view-consistent labels from multi-view images.<n>We then perform object- and part-level queries on the 3D Gaussian primitives to identify static and articulated elements, estimating global transformations and local articulation parameters along with affordances.
arXiv Detail & Related papers (2025-04-09T12:36:48Z) - Manual-PA: Learning 3D Part Assembly from Instruction Diagrams [54.555154845137906]
We present Manual-PA, a transformer-based instruction Manual-guided 3D Part Assembly framework.<n>Our results show that using the diagrams and the order of the parts lead to significant improvements in assembly performance against the state of the art.
arXiv Detail & Related papers (2024-11-27T03:10:29Z) - Generative Timelines for Instructed Visual Assembly [106.80501761556606]
The objective of this work is to manipulate visual timelines (e.g. a video) through natural language instructions.
We propose the Timeline Assembler, a generative model trained to perform instructed visual assembly tasks.
arXiv Detail & Related papers (2024-11-19T07:26:30Z) - Multi-3D-Models Registration-Based Augmented Reality (AR) Instructions
for Assembly [7.716174636585781]
BRICKxAR (M3D) visualizes rendered 3D assembly parts at the assembly location of the physical assembly model.
BRICKxAR (M3D) utilizes deep learning-trained 3D model-based registration.
arXiv Detail & Related papers (2023-11-27T21:53:17Z) - IKEA-Manual: Seeing Shape Assembly Step by Step [26.79113677450921]
We present IKEA-Manual, a dataset consisting of 102 IKEA objects paired with assembly manuals.
We provide fine-grained annotations on the IKEA objects and assembly manuals, including assembly parts, assembly plans, manual segmentation, and 2D-3D correspondence between 3D parts and visual manuals.
arXiv Detail & Related papers (2023-02-03T17:32:22Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.