AI Assisted AR Assembly: Object Recognition and Computer Vision for Augmented Reality Assisted Assembly
- URL: http://arxiv.org/abs/2511.05394v2
- Date: Thu, 13 Nov 2025 18:28:47 GMT
- Title: AI Assisted AR Assembly: Object Recognition and Computer Vision for Augmented Reality Assisted Assembly
- Authors: Alexander Htet Kyaw, Haotian Ma, Sasa Zivkovic, Jenny Sabin,
- Abstract summary: We present an AI-assisted Augmented Reality assembly workflow that uses deep learning-based object recognition.<n>For each assembly step, the system displays a bounding box around the corresponding components in the physical space, and where the component should be placed.
- Score: 40.836596733334254
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: We present an AI-assisted Augmented Reality assembly workflow that uses deep learning-based object recognition to identify different assembly components and display step-by-step instructions. For each assembly step, the system displays a bounding box around the corresponding components in the physical space, and where the component should be placed. By connecting assembly instructions with the real-time location of relevant components, the system eliminates the need for manual searching, sorting, or labeling of different components before each assembly. To demonstrate the feasibility of using object recognition for AR-assisted assembly, we highlight a case study involving the assembly of LEGO sculptures.
Related papers
- Augmented Assembly: Object Recognition and Hand Tracking for Adaptive Assembly Instructions in Augmented Reality [40.836596733334254]
We present an AR-assisted assembly workflow that leverages object recognition and hand tracking.<n>Using object recognition, the system detects and localizes components in real time to create a digital twin of the workspace.<n>A case study with LEGO blocks and custom 3D-printed components demonstrates how the system links digital instructions to physical assembly.
arXiv Detail & Related papers (2025-11-22T22:49:40Z) - Manual2Skill++: Connector-Aware General Robotic Assembly from Instruction Manuals via Vision-Language Models [26.61083683414806]
We consider connections as first-class primitives in assembly representation, including connector types, specifications, quantities, and placement locations.<n>We present Manual2Skill++, a vision-language framework that automatically extracts structured connection information from assembly manuals.<n>A large-scale vision-language model parses symbolic diagrams and annotations in manuals to instantiate these graphs, leveraging the rich connection knowledge embedded in human-designed instructions.
arXiv Detail & Related papers (2025-10-18T04:13:26Z) - EgoPrompt: Prompt Learning for Egocentric Action Recognition [49.12318087940015]
EgoPrompt is a prompt learning-based framework to conduct egocentric action recognition task.<n>EgoPrompt achieves state-of-the-art performance across within-dataset, cross-dataset, and base-to-novel generalization benchmarks.
arXiv Detail & Related papers (2025-08-05T09:47:07Z) - Manual-PA: Learning 3D Part Assembly from Instruction Diagrams [54.555154845137906]
We present Manual-PA, a transformer-based instruction Manual-guided 3D Part Assembly framework.<n>Our results show that using the diagrams and the order of the parts lead to significant improvements in assembly performance against the state of the art.
arXiv Detail & Related papers (2024-11-27T03:10:29Z) - IKEA-Manual: Seeing Shape Assembly Step by Step [26.79113677450921]
We present IKEA-Manual, a dataset consisting of 102 IKEA objects paired with assembly manuals.
We provide fine-grained annotations on the IKEA objects and assembly manuals, including assembly parts, assembly plans, manual segmentation, and 2D-3D correspondence between 3D parts and visual manuals.
arXiv Detail & Related papers (2023-02-03T17:32:22Z) - Object Detection for Understanding Assembly Instruction Using
Context-aware Data Augmentation and Cascade Mask R-CNN [4.3310896118860445]
We developed a context-aware data augmentation scheme for speech bubble segmentation.
Also, we showed that deep learning can be useful to understand assembly instruction by detecting the essential objects in the instruction.
arXiv Detail & Related papers (2021-01-07T12:10:27Z) - SAFCAR: Structured Attention Fusion for Compositional Action Recognition [47.43959215267547]
We develop and test a novel Structured Attention Fusion (SAF) self-attention mechanism to combine information from object detections.
We show that our approach recognizes novel verb-noun compositions more effectively than current state of the art systems.
We validate our approach on the challenging Something-Else tasks from the Something-Something-V2 dataset.
arXiv Detail & Related papers (2020-12-03T17:45:01Z) - Fine-grained activity recognition for assembly videos [31.468641678626696]
We extend the fine-grained activity recognition setting to address the task of assembly action recognition in its full generality.
We develop a general method for recognizing assembly actions from observation sequences, along with observation features that take advantage of a spatial assembly's special structure.
arXiv Detail & Related papers (2020-12-02T18:38:17Z) - Disassembling Object Representations without Labels [75.2215716328001]
We study a new representation-learning task, which we termed as disassembling object representations.
Disassembling enables category-specific modularity in the learned representations.
We propose an unsupervised approach to achieving disassembling, named Unsupervised Disassembling Object Representation (UDOR)
arXiv Detail & Related papers (2020-04-03T08:23:09Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.