Gaze-Vergence-Controlled See-Through Vision in Augmented Reality
- URL: http://arxiv.org/abs/2207.02645v1
- Date: Wed, 6 Jul 2022 13:11:34 GMT
- Title: Gaze-Vergence-Controlled See-Through Vision in Augmented Reality
- Authors: Zhimin Wang, Yuxin Zhao, and Feng Lu
- Abstract summary: We argue that using common interaction modalities, e.g., midair click and speech, may not be the optimal way to control see-through vision.
This is because when we want to see through something, it is physically related to our gaze depth/vergence.
This paper proposes a novel gaze-vergence-controlled (GVC) see-through vision technique in AR.
- Score: 8.731965517676842
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Augmented Reality (AR) see-through vision is an interesting research topic
since it enables users to see through a wall and see the occluded objects. Most
existing research focuses on the visual effects of see-through vision, while
the interaction method is less studied. However, we argue that using common
interaction modalities, e.g., midair click and speech, may not be the optimal
way to control see-through vision. This is because when we want to see through
something, it is physically related to our gaze depth/vergence and thus should
be naturally controlled by the eyes. Following this idea, this paper proposes a
novel gaze-vergence-controlled (GVC) see-through vision technique in AR. Since
gaze depth is needed, we build a gaze tracking module with two infrared cameras
and the corresponding algorithm and assemble it into the Microsoft HoloLens 2
to achieve gaze depth estimation. We then propose two different GVC modes for
see-through vision to fit different scenarios. Extensive experimental results
demonstrate that our gaze depth estimation is efficient and accurate. By
comparing with conventional interaction modalities, our GVC techniques are also
shown to be superior in terms of efficiency and more preferred by users.
Finally, we present four example applications of gaze-vergence-controlled
see-through vision.
Related papers
- POV: Prompt-Oriented View-Agnostic Learning for Egocentric Hand-Object
Interaction in the Multi-View World [59.545114016224254]
Humans are good at translating third-person observations of hand-object interactions into an egocentric view.
We propose a Prompt-Oriented View-agnostic learning framework, which enables this view adaptation with few egocentric videos.
arXiv Detail & Related papers (2024-03-09T09:54:44Z) - 3D Gaze Vis: Sharing Eye Tracking Data Visualization for Collaborative
Work in VR Environment [3.3130410344903325]
We designed three different eye tracking data visualizations: gaze cursor, gaze spotlight and gaze trajectory in VR scene for a course of human heart.
We found that gaze cursor from doctors could help students learn complex 3D heart models more effectively.
It indicated that sharing eye tracking data visualization could improve the quality and efficiency of collaborative work in the VR environment.
arXiv Detail & Related papers (2023-03-19T12:00:53Z) - On Human Visual Contrast Sensitivity and Machine Vision Robustness: A
Comparative Study [68.41864523774164]
How color differences affect machine vision has not been well explored.
Our work tries to bridge this gap between the human color vision aspect of visual recognition and that of the machine.
We devise a new framework in two dimensions to perform extensive analyses on the effect of color contrast and corrupted images.
arXiv Detail & Related papers (2022-12-16T18:51:41Z) - Multimodal Across Domains Gaze Target Detection [18.41238482101682]
This paper addresses the gaze target detection problem in single images captured from the third-person perspective.
We present a multimodal deep architecture to infer where a person in a scene is looking.
arXiv Detail & Related papers (2022-08-23T09:09:00Z) - RAZE: Region Guided Self-Supervised Gaze Representation Learning [5.919214040221055]
RAZE is a Region guided self-supervised gAZE representation learning framework which leverage from non-annotated facial image data.
Ize-Net is a capsule layer based CNN architecture which can efficiently capture rich eye representation.
arXiv Detail & Related papers (2022-08-04T06:23:49Z) - Peripheral Vision Transformer [52.55309200601883]
We take a biologically inspired approach and explore to model peripheral vision in deep neural networks for visual recognition.
We propose to incorporate peripheral position encoding to the multi-head self-attention layers to let the network learn to partition the visual field into diverse peripheral regions given training data.
We evaluate the proposed network, dubbed PerViT, on the large-scale ImageNet dataset and systematically investigate the inner workings of the model for machine perception.
arXiv Detail & Related papers (2022-06-14T12:47:47Z) - GIMO: Gaze-Informed Human Motion Prediction in Context [75.52839760700833]
We propose a large-scale human motion dataset that delivers high-quality body pose sequences, scene scans, and ego-centric views with eye gaze.
Our data collection is not tied to specific scenes, which further boosts the motion dynamics observed from our subjects.
To realize the full potential of gaze, we propose a novel network architecture that enables bidirectional communication between the gaze and motion branches.
arXiv Detail & Related papers (2022-04-20T13:17:39Z) - Searching the Search Space of Vision Transformer [98.96601221383209]
Vision Transformer has shown great visual representation power in substantial vision tasks such as recognition and detection.
We propose to use neural architecture search to automate this process, by searching not only the architecture but also the search space.
We provide design guidelines of general vision transformers with extensive analysis according to the space searching process.
arXiv Detail & Related papers (2021-11-29T17:26:07Z) - Imitation Learning with Human Eye Gaze via Multi-Objective Prediction [3.5779268406205618]
We propose Gaze Regularized Imitation Learning (GRIL), a novel context-aware imitation learning architecture.
GRIL learns concurrently from both human demonstrations and eye gaze to solve tasks where visual attention provides important context.
We show that GRIL outperforms several state-of-the-art gaze-based imitation learning algorithms, simultaneously learns to predict human visual attention, and generalizes to scenarios not present in the training data.
arXiv Detail & Related papers (2021-02-25T17:13:13Z) - LNSMM: Eye Gaze Estimation With Local Network Share Multiview Multitask [7.065909514483728]
We propose a novel methodology to estimate eye gaze points and eye gaze directions simultaneously.
The experiment show our method is state-of-the-art the current mainstream methods on two indicators of gaze points and gaze directions.
arXiv Detail & Related papers (2021-01-18T15:14:24Z) - What Can You Learn from Your Muscles? Learning Visual Representation
from Human Interactions [50.435861435121915]
We use human interaction and attention cues to investigate whether we can learn better representations compared to visual-only representations.
Our experiments show that our "muscly-supervised" representation outperforms a visual-only state-of-the-art method MoCo.
arXiv Detail & Related papers (2020-10-16T17:46:53Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.