Related papers: Teachable Reality: Prototyping Tangible Augmented Reality with Everyday Objects by Leveraging Interactive Machine Teaching

Teachable Reality: Prototyping Tangible Augmented Reality with Everyday Objects by Leveraging Interactive Machine Teaching

URL: http://arxiv.org/abs/2302.11046v1
Date: Tue, 21 Feb 2023 23:03:49 GMT
Title: Teachable Reality: Prototyping Tangible Augmented Reality with Everyday Objects by Leveraging Interactive Machine Teaching
Authors: Kyzyl Monteiro, Ritik Vatsal, Neil Chulpongsatorn, Aman Parnami, Ryo Suzuki
Abstract summary: Teachable Reality is an augmented reality (AR) prototyping tool for creating interactive tangible AR applications with arbitrary everyday objects. It identifies the user-defined tangible and gestural interactions using an on-demand computer vision model. Our approach can lower the barrier to creating functional AR prototypes while also allowing flexible and general-purpose prototyping experiences.
Score: 4.019017835137353
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: This paper introduces Teachable Reality, an augmented reality (AR) prototyping tool for creating interactive tangible AR applications with arbitrary everyday objects. Teachable Reality leverages vision-based interactive machine teaching (e.g., Teachable Machine), which captures real-world interactions for AR prototyping. It identifies the user-defined tangible and gestural interactions using an on-demand computer vision model. Based on this, the user can easily create functional AR prototypes without programming, enabled by a trigger-action authoring interface. Therefore, our approach allows the flexibility, customizability, and generalizability of tangible AR applications that can address the limitation of current marker-based approaches. We explore the design space and demonstrate various AR prototypes, which include tangible and deformable interfaces, context-aware assistants, and body-driven AR applications. The results of our user study and expert interviews confirm that our approach can lower the barrier to creating functional AR prototypes while also allowing flexible and general-purpose prototyping experiences.

Related papers

Dynamic Scoring with Enhanced Semantics for Training-Free Human-Object Interaction Detection [51.52749744031413]
Human-Object Interaction (HOI) detection aims to identify humans and objects within images and interpret their interactions.<n>Existing HOI methods rely heavily on large datasets with manual annotations to learn interactions from visual cues.<n>We propose a novel training-free HOI detection framework for Dynamic Scoring with enhanced semantics.
arXiv Detail & Related papers (2025-07-23T12:30:19Z)
Accessible Gesture-Driven Augmented Reality Interaction System [0.0]
Augmented reality (AR) offers immersive interaction but remains inaccessible for users with motor impairments or limited dexterity.<n>This study proposes a gesture-based interaction system for AR environments, leveraging deep learning to recognize hand and body gestures.
arXiv Detail & Related papers (2025-06-18T07:10:48Z)
Survey of User Interface Design and Interaction Techniques in Generative AI Applications [79.55963742878684]
We aim to create a compendium of different user-interaction patterns that can be used as a reference for designers and developers alike. We also strive to lower the entry barrier for those attempting to learn more about the design of generative AI applications.
arXiv Detail & Related papers (2024-10-28T23:10:06Z)
Flex: End-to-End Text-Instructed Visual Navigation from Foundation Model Features [59.892436892964376]
We investigate the minimal data requirements and architectural adaptations necessary to achieve robust closed-loop performance with vision-based control policies.<n>Our findings are synthesized in Flex (Fly lexically), a framework that uses pre-trained Vision Language Models (VLMs) as frozen patch-wise feature extractors.<n>We demonstrate the effectiveness of this approach on a quadrotor fly-to-target task, where agents trained via behavior cloning successfully generalize to real-world scenes.
arXiv Detail & Related papers (2024-10-16T19:59:31Z)
ARPOV: Expanding Visualization of Object Detection in AR with Panoramic Mosaic Stitching [0.0]
ARPOV is an interactive visual analytics tool for analyzing object detection model outputs tailored to video captured by an AR headset. The proposed tool leverages panorama stitching to expand the view of the environment while automatically filtering undesirable frames.
arXiv Detail & Related papers (2024-10-01T20:29:14Z)
Polaris: Open-ended Interactive Robotic Manipulation via Syn2Real Visual Grounding and Large Language Models [53.22792173053473]
We introduce an interactive robotic manipulation framework called Polaris. Polaris integrates perception and interaction by utilizing GPT-4 alongside grounded vision models. We propose a novel Synthetic-to-Real (Syn2Real) pose estimation pipeline.
arXiv Detail & Related papers (2024-08-15T06:40:38Z)
LVLM-Interpret: An Interpretability Tool for Large Vision-Language Models [50.259006481656094]
We present a novel interactive application aimed towards understanding the internal mechanisms of large vision-language models. Our interface is designed to enhance the interpretability of the image patches, which are instrumental in generating an answer. We present a case study of how our application can aid in understanding failure mechanisms in a popular large multi-modal model: LLaVA.
arXiv Detail & Related papers (2024-04-03T23:57:34Z)
Voila-A: Aligning Vision-Language Models with User's Gaze Attention [56.755993500556734]
We introduce gaze information as a proxy for human attention to guide Vision-Language Models (VLMs) We propose a novel approach, Voila-A, for gaze alignment to enhance the interpretability and effectiveness of these models in real-world applications.
arXiv Detail & Related papers (2023-12-22T17:34:01Z)
Typing on Any Surface: A Deep Learning-based Method for Real-Time Keystroke Detection in Augmented Reality [4.857109990499532]
Mid-air keyboard interface, wireless keyboards or voice input, either suffer from poor ergonomic design, limited accuracy, or are simply embarrassing to use in public. This paper proposes and validates a deep-learning based approach, that enables AR applications to accurately predict keystrokes from the user perspective RGB video stream. A two-stage model, combing an off-the-shelf hand landmark extractor and a novel adaptive Convolutional Recurrent Neural Network (C-RNN) was trained.
arXiv Detail & Related papers (2023-08-31T23:58:25Z)
Systematic Adaptation of Communication-focused Machine Learning Models from Real to Virtual Environments for Human-Robot Collaboration [1.392250707100996]
This paper presents a systematic framework for the real to virtual adaptation using limited size of virtual dataset. Hand gestures recognition which has been a topic of much research and subsequent commercialization in the real world has been possible because of the creation of large, labelled datasets.
arXiv Detail & Related papers (2023-07-21T03:24:55Z)
ArK: Augmented Reality with Knowledge Interactive Emergent Ability [115.72679420999535]
We develop an infinite agent that learns to transfer knowledge memory from general foundation models to novel domains. The heart of our approach is an emerging mechanism, dubbed Augmented Reality with Knowledge Inference Interaction (ArK) We show that our ArK approach, combined with large foundation models, significantly improves the quality of generated 2D/3D scenes.
arXiv Detail & Related papers (2023-05-01T17:57:01Z)
Learning Action-Effect Dynamics for Hypothetical Vision-Language Reasoning Task [50.72283841720014]
We propose a novel learning strategy that can improve reasoning about the effects of actions. We demonstrate the effectiveness of our proposed approach and discuss its advantages over previous baselines in terms of performance, data efficiency, and generalization capability.
arXiv Detail & Related papers (2022-12-07T05:41:58Z)
OpenDR: An Open Toolkit for Enabling High Performance, Low Footprint Deep Learning for Robotics [0.0]
We present the Open Deep Learning Toolkit for Robotics (OpenDR) OpenDR aims at developing an open, non-proprietary, efficient, and modular toolkit that can be easily used by robotics companies and research institutions.
arXiv Detail & Related papers (2022-03-01T12:59:59Z)
Modular approach to data preprocessing in ALOHA and application to a smart industry use case [0.0]
The paper addresses a modular approach, integrated into the ALOHA tool flow, to support the data preprocessing and transformation pipeline. To demonstrate the effectiveness of the approach, we present some experimental results related to a keyword spotting use case.
arXiv Detail & Related papers (2021-02-02T06:48:51Z)

This list is automatically generated from the titles and abstracts of the papers in this site.