Multistream Gaze Estimation with Anatomical Eye Region Isolation by
Synthetic to Real Transfer Learning
- URL: http://arxiv.org/abs/2206.09256v2
- Date: Mon, 12 Feb 2024 20:13:26 GMT
- Title: Multistream Gaze Estimation with Anatomical Eye Region Isolation by
Synthetic to Real Transfer Learning
- Authors: Zunayed Mahmud, Paul Hungler, Ali Etemad
- Abstract summary: We propose a novel neural pipeline, MSGazeNet, that learns gaze representations by taking advantage of the eye anatomy information.
Our framework surpasses the state-of-the-art by 7.57% and 1.85% on three gaze estimation datasets.
- Score: 24.872143206600185
- License: http://creativecommons.org/licenses/by-nc-sa/4.0/
- Abstract: We propose a novel neural pipeline, MSGazeNet, that learns gaze
representations by taking advantage of the eye anatomy information through a
multistream framework. Our proposed solution comprises two components, first a
network for isolating anatomical eye regions, and a second network for
multistream gaze estimation. The eye region isolation is performed with a U-Net
style network which we train using a synthetic dataset that contains eye region
masks for the visible eyeball and the iris region. The synthetic dataset used
in this stage is procured using the UnityEyes simulator, and consists of 80,000
eye images. Successive to training, the eye region isolation network is then
transferred to the real domain for generating masks for the real-world eye
images. In order to successfully make the transfer, we exploit domain
randomization in the training process, which allows for the synthetic images to
benefit from a larger variance with the help of augmentations that resemble
artifacts. The generated eye region masks along with the raw eye images are
then used together as a multistream input to our gaze estimation network, which
consists of wide residual blocks. The output embeddings from these encoders are
fused in the channel dimension before feeding into the gaze regression layers.
We evaluate our framework on three gaze estimation datasets and achieve strong
performances. Our method surpasses the state-of-the-art by 7.57% and 1.85% on
two datasets, and obtains competitive results on the other. We also study the
robustness of our method with respect to the noise in the data and demonstrate
that our model is less sensitive to noisy data. Lastly, we perform a variety of
experiments including ablation studies to evaluate the contribution of
different components and design choices in our solution.
Related papers
- Deep Domain Adaptation: A Sim2Real Neural Approach for Improving Eye-Tracking Systems [80.62854148838359]
Eye image segmentation is a critical step in eye tracking that has great influence over the final gaze estimate.
We use dimensionality-reduction techniques to measure the overlap between the target eye images and synthetic training data.
Our methods result in robust, improved performance when tackling the discrepancy between simulation and real-world data samples.
arXiv Detail & Related papers (2024-03-23T22:32:06Z) - Masking Improves Contrastive Self-Supervised Learning for ConvNets, and Saliency Tells You Where [63.61248884015162]
We aim to alleviate the burden of including masking operation into the contrastive-learning framework for convolutional neural networks.
We propose to explicitly take the saliency constraint into consideration in which the masked regions are more evenly distributed among the foreground and background.
arXiv Detail & Related papers (2023-09-22T09:58:38Z) - Synthetic optical coherence tomography angiographs for detailed retinal
vessel segmentation without human annotations [12.571349114534597]
We present a lightweight simulation of the retinal vascular network based on space colonization for faster and more realistic OCTA synthesis.
We demonstrate the superior segmentation performance of our approach in extensive quantitative and qualitative experiments on three public datasets.
arXiv Detail & Related papers (2023-06-19T14:01:47Z) - Parents and Children: Distinguishing Multimodal DeepFakes from Natural Images [60.34381768479834]
Recent advancements in diffusion models have enabled the generation of realistic deepfakes from textual prompts in natural language.
We pioneer a systematic study on deepfake detection generated by state-of-the-art diffusion models.
arXiv Detail & Related papers (2023-04-02T10:25:09Z) - Unsupervised Domain Transfer with Conditional Invertible Neural Networks [83.90291882730925]
We propose a domain transfer approach based on conditional invertible neural networks (cINNs)
Our method inherently guarantees cycle consistency through its invertible architecture, and network training can efficiently be conducted with maximum likelihood.
Our method enables the generation of realistic spectral data and outperforms the state of the art on two downstream classification tasks.
arXiv Detail & Related papers (2023-03-17T18:00:27Z) - Semantic Labeling of High Resolution Images Using EfficientUNets and
Transformers [5.177947445379688]
We propose a new segmentation model that combines convolutional neural networks with deep transformers.
Our results demonstrate that the proposed methodology improves segmentation accuracy compared to state-of-the-art techniques.
arXiv Detail & Related papers (2022-06-20T12:03:54Z) - Gaze Estimation with Eye Region Segmentation and Self-Supervised
Multistream Learning [8.422257363944295]
We present a novel multistream network that learns robust eye representations for gaze estimation.
We first create a synthetic dataset containing eye region masks detailing the visible eyeball and iris using a simulator.
We then perform eye region segmentation with a U-Net type model which we later use to generate eye region masks for real-world images.
arXiv Detail & Related papers (2021-12-15T04:44:45Z) - Adversarial Domain Feature Adaptation for Bronchoscopic Depth Estimation [111.89519571205778]
In this work, we propose an alternative domain-adaptive approach to depth estimation.
Our novel two-step structure first trains a depth estimation network with labeled synthetic images in a supervised manner.
The results of our experiments show that the proposed method improves the network's performance on real images by a considerable margin.
arXiv Detail & Related papers (2021-09-24T08:11:34Z) - Enhancing Photorealism Enhancement [83.88433283714461]
We present an approach to enhancing the realism of synthetic images using a convolutional network.
We analyze scene layout distributions in commonly used datasets and find that they differ in important ways.
We report substantial gains in stability and realism in comparison to recent image-to-image translation methods.
arXiv Detail & Related papers (2021-05-10T19:00:49Z) - SharinGAN: Combining Synthetic and Real Data for Unsupervised Geometry
Estimation [18.29202999419042]
We propose a novel method for combining synthetic and real images when training networks.
We suggest a method for mapping both image types into a single, shared domain.
Our experiments demonstrate significant improvements over the state-of-the-art in two important domains.
arXiv Detail & Related papers (2020-06-07T02:45:33Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.