Neural Networks for Semantic Gaze Analysis in XR Settings
- URL: http://arxiv.org/abs/2103.10451v1
- Date: Thu, 18 Mar 2021 18:05:01 GMT
- Title: Neural Networks for Semantic Gaze Analysis in XR Settings
- Authors: Lena Stubbemann, Dominik D\"urrschnabel, Robert Refflinghaus
- Abstract summary: We present a novel approach which minimizes time and information necessary to annotate volumes of interest.
We train convolutional neural networks (CNNs) on synthetic data sets derived from virtual models using image augmentation techniques.
We evaluate our method in real and virtual environments, showing that the method can compete with state-of-the-art approaches.
- Score: 0.0
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Virtual-reality (VR) and augmented-reality (AR) technology is increasingly
combined with eye-tracking. This combination broadens both fields and opens up
new areas of application, in which visual perception and related cognitive
processes can be studied in interactive but still well controlled settings.
However, performing a semantic gaze analysis of eye-tracking data from
interactive three-dimensional scenes is a resource-intense task, which so far
has been an obstacle to economic use. In this paper we present a novel approach
which minimizes time and information necessary to annotate volumes of interest
(VOIs) by using techniques from object recognition. To do so, we train
convolutional neural networks (CNNs) on synthetic data sets derived from
virtual models using image augmentation techniques. We evaluate our method in
real and virtual environments, showing that the method can compete with
state-of-the-art approaches, while not relying on additional markers or
preexisting databases but instead offering cross-platform use.
Related papers
- Deep Domain Adaptation: A Sim2Real Neural Approach for Improving Eye-Tracking Systems [80.62854148838359]
Eye image segmentation is a critical step in eye tracking that has great influence over the final gaze estimate.
We use dimensionality-reduction techniques to measure the overlap between the target eye images and synthetic training data.
Our methods result in robust, improved performance when tackling the discrepancy between simulation and real-world data samples.
arXiv Detail & Related papers (2024-03-23T22:32:06Z) - A survey of synthetic data augmentation methods in computer vision [0.0]
This paper presents an extensive review of synthetic data augmentation techniques.
We focus on the important data generation and augmentation techniques, general scope of application and specific use-cases.
We provide a summary of common synthetic datasets for training computer vision models.
arXiv Detail & Related papers (2024-03-15T07:34:08Z) - Federated Multi-View Synthesizing for Metaverse [52.59476179535153]
The metaverse is expected to provide immersive entertainment, education, and business applications.
Virtual reality (VR) transmission over wireless networks is data- and computation-intensive.
We have developed a novel multi-view synthesizing framework that can efficiently provide synthesizing, storage, and communication resources for wireless content delivery in the metaverse.
arXiv Detail & Related papers (2023-12-18T13:51:56Z) - DNS SLAM: Dense Neural Semantic-Informed SLAM [92.39687553022605]
DNS SLAM is a novel neural RGB-D semantic SLAM approach featuring a hybrid representation.
Our method integrates multi-view geometry constraints with image-based feature extraction to improve appearance details.
Our experimental results achieve state-of-the-art performance on both synthetic data and real-world data tracking.
arXiv Detail & Related papers (2023-11-30T21:34:44Z) - Spatial Reasoning for Few-Shot Object Detection [21.3564383157159]
We propose a spatial reasoning framework that detects novel objects with only a few training examples in a context.
We employ a graph convolutional network as the RoIs and their relatedness are defined as nodes and edges, respectively.
We demonstrate that the proposed method significantly outperforms the state-of-the-art methods and verify its efficacy through extensive ablation studies.
arXiv Detail & Related papers (2022-11-02T12:38:08Z) - Towards Scale Consistent Monocular Visual Odometry by Learning from the
Virtual World [83.36195426897768]
We propose VRVO, a novel framework for retrieving the absolute scale from virtual data.
We first train a scale-aware disparity network using both monocular real images and stereo virtual data.
The resulting scale-consistent disparities are then integrated with a direct VO system.
arXiv Detail & Related papers (2022-03-11T01:51:54Z) - How Facial Features Convey Attention in Stationary Environments [0.0]
This paper aims to extend previous research on distraction detection by analyzing which visual features contribute most to predicting awareness and fatigue.
We utilized the open source facial analysis toolkit OpenFace in order to analyze visual data of subjects at varying levels of attentiveness.
arXiv Detail & Related papers (2021-11-29T20:11:57Z) - Scene Synthesis via Uncertainty-Driven Attribute Synchronization [52.31834816911887]
This paper introduces a novel neural scene synthesis approach that can capture diverse feature patterns of 3D scenes.
Our method combines the strength of both neural network-based and conventional scene synthesis approaches.
arXiv Detail & Related papers (2021-08-30T19:45:07Z) - Leveraging Semantic Scene Characteristics and Multi-Stream Convolutional
Architectures in a Contextual Approach for Video-Based Visual Emotion
Recognition in the Wild [31.40575057347465]
We tackle the task of video-based visual emotion recognition in the wild.
Standard methodologies that rely solely on the extraction of bodily and facial features often fall short of accurate emotion prediction.
We aspire to alleviate this problem by leveraging visual context in the form of scene characteristics and attributes.
arXiv Detail & Related papers (2021-05-16T17:31:59Z) - Relational Graph Learning on Visual and Kinematics Embeddings for
Accurate Gesture Recognition in Robotic Surgery [84.73764603474413]
We propose a novel online approach of multi-modal graph network (i.e., MRG-Net) to dynamically integrate visual and kinematics information.
The effectiveness of our method is demonstrated with state-of-the-art results on the public JIGSAWS dataset.
arXiv Detail & Related papers (2020-11-03T11:00:10Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.