L2CS-Net: Fine-Grained Gaze Estimation in Unconstrained Environments
- URL: http://arxiv.org/abs/2203.03339v1
- Date: Mon, 7 Mar 2022 12:35:39 GMT
- Title: L2CS-Net: Fine-Grained Gaze Estimation in Unconstrained Environments
- Authors: Ahmed A.Abdelrahman, Thorsten Hempel, Aly Khalifa, Ayoub Al-Hamadi
- Abstract summary: We propose a robust CNN-based model for predicting gaze in unconstrained settings.
We use two identical losses, one for each angle, to improve network learning and increase its generalization.
Our proposed model achieves state-of-the-art accuracy of 3.92deg and 10.41deg on MPIIGaze and Gaze360 datasets, respectively.
- Score: 2.5234156040689237
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Human gaze is a crucial cue used in various applications such as human-robot
interaction and virtual reality. Recently, convolution neural network (CNN)
approaches have made notable progress in predicting gaze direction. However,
estimating gaze in-the-wild is still a challenging problem due to the
uniqueness of eye appearance, lightning conditions, and the diversity of head
pose and gaze directions. In this paper, we propose a robust CNN-based model
for predicting gaze in unconstrained settings. We propose to regress each gaze
angle separately to improve the per-angel prediction accuracy, which will
enhance the overall gaze performance. In addition, we use two identical losses,
one for each angle, to improve network learning and increase its
generalization. We evaluate our model with two popular datasets collected with
unconstrained settings. Our proposed model achieves state-of-the-art accuracy
of 3.92{\deg} and 10.41{\deg} on MPIIGaze and Gaze360 datasets, respectively.
We make our code open source at https://github.com/Ahmednull/L2CS-Net.
Related papers
- Merging Multiple Datasets for Improved Appearance-Based Gaze Estimation [10.682719521609743]
Two-stage Transformer-based Gaze-feature Fusion (TTGF) method uses transformers to merge information from each eye and the face separately and then merge across the two eyes.
Our proposed Gaze Adaptation Module (GAM) method handles annotation inconsis-tency by applying a Gaze Adaption Module for each dataset to correct gaze estimates from a single shared estimator.
arXiv Detail & Related papers (2024-09-02T02:51:40Z) - 3DGazeNet: Generalizing Gaze Estimation with Weak-Supervision from
Synthetic Views [67.00931529296788]
We propose to train general gaze estimation models which can be directly employed in novel environments without adaptation.
We create a large-scale dataset of diverse faces with gaze pseudo-annotations, which we extract based on the 3D geometry of the scene.
We test our method in the task of gaze generalization, in which we demonstrate improvement of up to 30% compared to state-of-the-art when no ground truth data are available.
arXiv Detail & Related papers (2022-12-06T14:15:17Z) - Scale Attention for Learning Deep Face Representation: A Study Against
Visual Scale Variation [69.45176408639483]
We reform the conv layer by resorting to the scale-space theory.
We build a novel style named SCale AttentioN Conv Neural Network (textbfSCAN-CNN)
As a single-shot scheme, the inference is more efficient than multi-shot fusion.
arXiv Detail & Related papers (2022-09-19T06:35:04Z) - HybridGazeNet: Geometric model guided Convolutional Neural Networks for
gaze estimation [9.649076368863904]
We propose HybridGazeNet, a unified framework that encodes the geometric eyeball model into the appearance-based CNN architecture explicitly.
Experiments on multiple challenging gaze datasets shows that HybridGazeNet has better accuracy and generalization ability compared with existing SOTA methods.
arXiv Detail & Related papers (2021-11-23T07:20:37Z) - An Adversarial Human Pose Estimation Network Injected with Graph
Structure [75.08618278188209]
In this paper, we design a novel generative adversarial network (GAN) to improve the localization accuracy of visible joints when some joints are invisible.
The network consists of two simple but efficient modules, Cascade Feature Network (CFN) and Graph Structure Network (GSN)
arXiv Detail & Related papers (2021-03-29T12:07:08Z) - Self-Learning Transformations for Improving Gaze and Head Redirection [49.61091281780071]
We propose a novel generative model for images of faces, that is capable of producing high-quality images under fine-grained control over eye gaze and head orientation angles.
This requires the disentangling of many appearance related factors including gaze and head orientation but also lighting, hue etc.
We show that explicitly disentangling task-irrelevant factors results in more accurate modelling of gaze and head orientation.
arXiv Detail & Related papers (2020-10-23T11:18:37Z) - 360-Degree Gaze Estimation in the Wild Using Multiple Zoom Scales [26.36068336169795]
We develop a model that mimics humans' ability to estimate the gaze by aggregating from focused looks.
The model avoids the need to extract clear eye patches.
We extend the model to handle the challenging task of 360-degree gaze estimation.
arXiv Detail & Related papers (2020-09-15T08:45:12Z) - On estimating gaze by self-attention augmented convolutions [6.015556590955813]
We propose a novel network architecture grounded on self-attention augmented convolutions to improve the quality of the learned features.
We dubbed our framework ARes-gaze, which explores our Attention-augmented ResNet (ARes-14) as twin convolutional backbones.
Results showed a decrease of the average angular error by 2.38% when compared to state-of-the-art methods on the MPIIFaceGaze data set, and a second-place on the EyeDiap data set.
arXiv Detail & Related papers (2020-08-25T14:29:05Z) - Dual In-painting Model for Unsupervised Gaze Correction and Animation in
the Wild [82.42401132933462]
We present a solution that works without the need for precise annotations of the gaze angle and the head pose.
Our method consists of three novel modules: the Gaze Correction module (GCM), the Gaze Animation module (GAM), and the Pretrained Autoencoder module (PAM)
arXiv Detail & Related papers (2020-08-09T23:14:16Z) - Towards End-to-end Video-based Eye-Tracking [50.0630362419371]
Estimating eye-gaze from images alone is a challenging task due to un-observable person-specific factors.
We propose a novel dataset and accompanying method which aims to explicitly learn these semantic and temporal relationships.
We demonstrate that the fusion of information from visual stimuli as well as eye images can lead towards achieving performance similar to literature-reported figures.
arXiv Detail & Related papers (2020-07-26T12:39:15Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.