Gaze Estimation with Eye Region Segmentation and Self-Supervised
Multistream Learning
- URL: http://arxiv.org/abs/2112.07878v1
- Date: Wed, 15 Dec 2021 04:44:45 GMT
- Title: Gaze Estimation with Eye Region Segmentation and Self-Supervised
Multistream Learning
- Authors: Zunayed Mahmud, Paul Hungler, Ali Etemad
- Abstract summary: We present a novel multistream network that learns robust eye representations for gaze estimation.
We first create a synthetic dataset containing eye region masks detailing the visible eyeball and iris using a simulator.
We then perform eye region segmentation with a U-Net type model which we later use to generate eye region masks for real-world images.
- Score: 8.422257363944295
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: We present a novel multistream network that learns robust eye representations
for gaze estimation. We first create a synthetic dataset containing eye region
masks detailing the visible eyeball and iris using a simulator. We then perform
eye region segmentation with a U-Net type model which we later use to generate
eye region masks for real-world eye images. Next, we pretrain an eye image
encoder in the real domain with self-supervised contrastive learning to learn
generalized eye representations. Finally, this pretrained eye encoder, along
with two additional encoders for visible eyeball region and iris, are used in
parallel in our multistream framework to extract salient features for gaze
estimation from real-world images. We demonstrate the performance of our method
on the EYEDIAP dataset in two different evaluation settings and achieve
state-of-the-art results, outperforming all the existing benchmarks on this
dataset. We also conduct additional experiments to validate the robustness of
our self-supervised network with respect to different amounts of labeled data
used for training.
Related papers
- LAC-Net: Linear-Fusion Attention-Guided Convolutional Network for Accurate Robotic Grasping Under the Occlusion [79.22197702626542]
This paper introduces a framework that explores amodal segmentation for robotic grasping in cluttered scenes.
We propose a Linear-fusion Attention-guided Convolutional Network (LAC-Net)
The results on different datasets show that our method achieves state-of-the-art performance.
arXiv Detail & Related papers (2024-08-06T14:50:48Z) - Deep Domain Adaptation: A Sim2Real Neural Approach for Improving Eye-Tracking Systems [80.62854148838359]
Eye image segmentation is a critical step in eye tracking that has great influence over the final gaze estimate.
We use dimensionality-reduction techniques to measure the overlap between the target eye images and synthetic training data.
Our methods result in robust, improved performance when tackling the discrepancy between simulation and real-world data samples.
arXiv Detail & Related papers (2024-03-23T22:32:06Z) - R-MAE: Regions Meet Masked Autoencoders [113.73147144125385]
We explore regions as a potential visual analogue of words for self-supervised image representation learning.
Inspired by Masked Autoencoding (MAE), a generative pre-training baseline, we propose masked region autoencoding to learn from groups of pixels or regions.
arXiv Detail & Related papers (2023-06-08T17:56:46Z) - RAZE: Region Guided Self-Supervised Gaze Representation Learning [5.919214040221055]
RAZE is a Region guided self-supervised gAZE representation learning framework which leverage from non-annotated facial image data.
Ize-Net is a capsule layer based CNN architecture which can efficiently capture rich eye representation.
arXiv Detail & Related papers (2022-08-04T06:23:49Z) - Multistream Gaze Estimation with Anatomical Eye Region Isolation by
Synthetic to Real Transfer Learning [24.872143206600185]
We propose a novel neural pipeline, MSGazeNet, that learns gaze representations by taking advantage of the eye anatomy information.
Our framework surpasses the state-of-the-art by 7.57% and 1.85% on three gaze estimation datasets.
arXiv Detail & Related papers (2022-06-18T17:57:32Z) - Peripheral Vision Transformer [52.55309200601883]
We take a biologically inspired approach and explore to model peripheral vision in deep neural networks for visual recognition.
We propose to incorporate peripheral position encoding to the multi-head self-attention layers to let the network learn to partition the visual field into diverse peripheral regions given training data.
We evaluate the proposed network, dubbed PerViT, on the large-scale ImageNet dataset and systematically investigate the inner workings of the model for machine perception.
arXiv Detail & Related papers (2022-06-14T12:47:47Z) - EllSeg-Gen, towards Domain Generalization for head-mounted eyetracking [19.913297057204357]
We show that convolutional networks excel at extracting gaze features despite the presence of such artifacts.
We compare the performance of a single model trained with multiple datasets against a pool of models trained on individual datasets.
Results indicate that models tested on datasets in which eye images exhibit higher appearance variability benefit from multiset training.
arXiv Detail & Related papers (2022-05-04T08:35:52Z) - Bayesian Eye Tracking [63.21413628808946]
Model-based eye tracking is susceptible to eye feature detection errors.
We propose a Bayesian framework for model-based eye tracking.
Compared to state-of-the-art model-based and learning-based methods, the proposed framework demonstrates significant improvement in generalization capability.
arXiv Detail & Related papers (2021-06-25T02:08:03Z) - Towards End-to-end Video-based Eye-Tracking [50.0630362419371]
Estimating eye-gaze from images alone is a challenging task due to un-observable person-specific factors.
We propose a novel dataset and accompanying method which aims to explicitly learn these semantic and temporal relationships.
We demonstrate that the fusion of information from visual stimuli as well as eye images can lead towards achieving performance similar to literature-reported figures.
arXiv Detail & Related papers (2020-07-26T12:39:15Z) - RIT-Eyes: Rendering of near-eye images for eye-tracking applications [3.4481343795011226]
Deep neural networks for video-based eye tracking have demonstrated resilience to noisy environments, stray reflections, and low resolution.
To train these networks, a large number of manually annotated images are required.
We introduce a synthetic eye image generation platform that improves upon previous work by adding features such as an active deformable iris, an aspherical cornea, retinal retro-reflection, gaze-coordinated eye-lid deformations, and blinks.
arXiv Detail & Related papers (2020-06-05T19:18:50Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.