Semantic Interaction in Augmented Reality Environments for Microsoft
HoloLens
- URL: http://arxiv.org/abs/2112.05846v1
- Date: Thu, 18 Nov 2021 14:58:04 GMT
- Title: Semantic Interaction in Augmented Reality Environments for Microsoft
HoloLens
- Authors: Peer Sch\"uett, Max Schwarz, Sven Behnke
- Abstract summary: We capture indoor environments and display interaction cues with known object classes using the Microsoft HoloLens.
The 3D mesh recorded by the HoloLens is annotated on-line, as the user moves, with semantic classes using a projective approach.
The results are fused onto the mesh; prominent object segments are identified and displayed in 3D to the user.
- Score: 28.10437301492564
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Augmented Reality is a promising technique for human-machine interaction.
Especially in robotics, which always considers systems in their environment, it
is highly beneficial to display visualizations and receive user input directly
in exactly that environment. We explore this idea using the Microsoft HoloLens,
with which we capture indoor environments and display interaction cues with
known object classes. The 3D mesh recorded by the HoloLens is annotated
on-line, as the user moves, with semantic classes using a projective approach,
which allows us to use a state-of-the-art 2D semantic segmentation method. The
results are fused onto the mesh; prominent object segments are identified and
displayed in 3D to the user. Finally, the user can trigger actions by gesturing
at the object. We both present qualitative results and analyze the accuracy and
performance of our method in detail on an indoor dataset.
Related papers
- SADG: Segment Any Dynamic Gaussian Without Object Trackers [39.77468734311312]
SADG, Segment Any Dynamic Gaussian Without Object Trackers, is a novel approach that combines dynamic Gaussian Splatting representation and semantic information without reliance on object IDs.
We learn semantically-aware features by leveraging masks generated from the Segment Anything Model (SAM) and utilizing our novel contrastive learning objective based on hard pixel mining.
We evaluate SADG on proposed benchmarks and demonstrate the superior performance of our approach in segmenting objects within dynamic scenes.
arXiv Detail & Related papers (2024-11-28T17:47:48Z) - Lost & Found: Updating Dynamic 3D Scene Graphs from Egocentric Observations [44.14584011692035]
Static semantic maps are unable to capture interactions between the environment and humans or robotic agents.
We present an approach that addresses this limitation. Based solely on egocentric recordings, we are able to track the 6DoF poses of the moving object.
We show how our method allows to command a mobile manipulator through teach & repeat, and how information about prior interaction allows a mobile manipulator to retrieve an object hidden in a drawer.
arXiv Detail & Related papers (2024-11-28T14:05:07Z) - Articulated Object Manipulation using Online Axis Estimation with SAM2-Based Tracking [59.87033229815062]
Articulated object manipulation requires precise object interaction, where the object's axis must be carefully considered.
Previous research employed interactive perception for manipulating articulated objects, but typically, open-loop approaches often suffer from overlooking the interaction dynamics.
We present a closed-loop pipeline integrating interactive perception with online axis estimation from segmented 3D point clouds.
arXiv Detail & Related papers (2024-09-24T17:59:56Z) - Variational Inference for Scalable 3D Object-centric Learning [19.445804699433353]
We tackle the task of scalable unsupervised object-centric representation learning on 3D scenes.
Existing approaches to object-centric representation learning show limitations in generalizing to larger scenes.
We propose to learn view-invariant 3D object representations in localized object coordinate systems.
arXiv Detail & Related papers (2023-09-25T10:23:40Z) - ROAM: Robust and Object-Aware Motion Generation Using Neural Pose
Descriptors [73.26004792375556]
This paper shows that robustness and generalisation to novel scene objects in 3D object-aware character synthesis can be achieved by training a motion model with as few as one reference object.
We leverage an implicit feature representation trained on object-only datasets, which encodes an SE(3)-equivariant descriptor field around the object.
We demonstrate substantial improvements in 3D virtual character motion and interaction quality and robustness to scenarios with unseen objects.
arXiv Detail & Related papers (2023-08-24T17:59:51Z) - AutoDecoding Latent 3D Diffusion Models [95.7279510847827]
We present a novel approach to the generation of static and articulated 3D assets that has a 3D autodecoder at its core.
The 3D autodecoder framework embeds properties learned from the target dataset in the latent space.
We then identify the appropriate intermediate volumetric latent space, and introduce robust normalization and de-normalization operations.
arXiv Detail & Related papers (2023-07-07T17:59:14Z) - ScanERU: Interactive 3D Visual Grounding based on Embodied Reference
Understanding [67.21613160846299]
Embodied Reference Understanding (ERU) is first designed for this concern.
New dataset called ScanERU is constructed to evaluate the effectiveness of this idea.
arXiv Detail & Related papers (2023-03-23T11:36:14Z) - Lifelong Ensemble Learning based on Multiple Representations for
Few-Shot Object Recognition [6.282068591820947]
We present a lifelong ensemble learning approach based on multiple representations to address the few-shot object recognition problem.
To facilitate lifelong learning, each approach is equipped with a memory unit for storing and retrieving object information instantly.
We have performed extensive sets of experiments to assess the performance of the proposed approach in offline, and open-ended scenarios.
arXiv Detail & Related papers (2022-05-04T10:29:10Z) - Skeleton-Based Mutually Assisted Interacted Object Localization and
Human Action Recognition [111.87412719773889]
We propose a joint learning framework for "interacted object localization" and "human action recognition" based on skeleton data.
Our method achieves the best or competitive performance with the state-of-the-art methods for human action recognition.
arXiv Detail & Related papers (2021-10-28T10:09:34Z) - HIDA: Towards Holistic Indoor Understanding for the Visually Impaired
via Semantic Instance Segmentation with a Wearable Solid-State LiDAR Sensor [25.206941504935685]
HIDA is a lightweight assistive system based on 3D point cloud instance segmentation with a solid-state LiDAR sensor.
Our entire system consists of three hardware components, two interactive functions(obstacle avoidance and object finding) and a voice user interface.
The proposed 3D instance segmentation model has achieved state-of-the-art performance on ScanNet v2 dataset.
arXiv Detail & Related papers (2021-07-07T12:23:53Z) - iGibson, a Simulation Environment for Interactive Tasks in Large
Realistic Scenes [54.04456391489063]
iGibson is a novel simulation environment to develop robotic solutions for interactive tasks in large-scale realistic scenes.
Our environment contains fifteen fully interactive home-sized scenes populated with rigid and articulated objects.
iGibson features enable the generalization of navigation agents, and that the human-iGibson interface and integrated motion planners facilitate efficient imitation learning of simple human demonstrated behaviors.
arXiv Detail & Related papers (2020-12-05T02:14:17Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.