Investigation of Architectures and Receptive Fields for Appearance-based
Gaze Estimation
- URL: http://arxiv.org/abs/2308.09593v1
- Date: Fri, 18 Aug 2023 14:41:51 GMT
- Title: Investigation of Architectures and Receptive Fields for Appearance-based
Gaze Estimation
- Authors: Yunhan Wang, Xiangwei Shi, Shalini De Mello, Hyung Jin Chang, Xucong
Zhang
- Abstract summary: We show that tuning a few simple parameters of a ResNet architecture can outperform most of the existing state-of-the-art methods for the gaze estimation task.
We obtain the state-of-the-art performances on three datasets with 3.64 on ETH-XGaze, 4.50 on MPIIFaceGaze, and 9.13 on Gaze360 degrees gaze estimation error.
- Score: 29.154335016375367
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: With the rapid development of deep learning technology in the past decade,
appearance-based gaze estimation has attracted great attention from both
computer vision and human-computer interaction research communities.
Fascinating methods were proposed with variant mechanisms including soft
attention, hard attention, two-eye asymmetry, feature disentanglement, rotation
consistency, and contrastive learning. Most of these methods take the
single-face or multi-region as input, yet the basic architecture of gaze
estimation has not been fully explored. In this paper, we reveal the fact that
tuning a few simple parameters of a ResNet architecture can outperform most of
the existing state-of-the-art methods for the gaze estimation task on three
popular datasets. With our extensive experiments, we conclude that the stride
number, input image resolution, and multi-region architecture are critical for
the gaze estimation performance while their effectiveness dependent on the
quality of the input face image. We obtain the state-of-the-art performances on
three datasets with 3.64 on ETH-XGaze, 4.50 on MPIIFaceGaze, and 9.13 on
Gaze360 degrees gaze estimation error by taking ResNet-50 as the backbone.
Related papers
- CrossGaze: A Strong Method for 3D Gaze Estimation in the Wild [4.089889918897877]
We propose CrossGaze, a strong baseline for gaze estimation.
Our model surpasses several state-of-the-art methods, achieving a mean angular error of 9.94 degrees.
Our proposed model serves as a strong foundation for future research and development in gaze estimation.
arXiv Detail & Related papers (2024-02-13T09:20:26Z) - RadOcc: Learning Cross-Modality Occupancy Knowledge through Rendering
Assisted Distillation [50.35403070279804]
3D occupancy prediction is an emerging task that aims to estimate the occupancy states and semantics of 3D scenes using multi-view images.
We propose RadOcc, a Rendering assisted distillation paradigm for 3D Occupancy prediction.
arXiv Detail & Related papers (2023-12-19T03:39:56Z) - Surface Geometry Processing: An Efficient Normal-based Detail
Representation [66.69000350849328]
We introduce an efficient surface detail processing framework in 2D normal domain.
We show that the proposed normal-based representation has three important properties, including detail separability, detail transferability and detail idempotence.
Three new schemes are further designed for geometric surface detail processing applications, including geometric texture synthesis, geometry detail transfer, and 3D surface super-resolution.
arXiv Detail & Related papers (2023-07-16T04:46:32Z) - NeRF-Gaze: A Head-Eye Redirection Parametric Model for Gaze Estimation [37.977032771941715]
We propose a novel Head-Eye redirection parametric model based on Neural Radiance Field.
Our model can decouple the face and eyes for separate neural rendering.
It can achieve the purpose of separately controlling the attributes of the face, identity, illumination, and eye gaze direction.
arXiv Detail & Related papers (2022-12-30T13:52:28Z) - 3DGazeNet: Generalizing Gaze Estimation with Weak-Supervision from
Synthetic Views [67.00931529296788]
We propose to train general gaze estimation models which can be directly employed in novel environments without adaptation.
We create a large-scale dataset of diverse faces with gaze pseudo-annotations, which we extract based on the 3D geometry of the scene.
We test our method in the task of gaze generalization, in which we demonstrate improvement of up to 30% compared to state-of-the-art when no ground truth data are available.
arXiv Detail & Related papers (2022-12-06T14:15:17Z) - Active Gaze Control for Foveal Scene Exploration [124.11737060344052]
We propose a methodology to emulate how humans and robots with foveal cameras would explore a scene.
The proposed method achieves an increase in detection F1-score of 2-3 percentage points for the same number of gaze shifts.
arXiv Detail & Related papers (2022-08-24T14:59:28Z) - GazeOnce: Real-Time Multi-Person Gaze Estimation [18.16091280655655]
Appearance-based gaze estimation aims to predict the 3D eye gaze direction from a single image.
Recent deep learning-based approaches have demonstrated excellent performance, but cannot output multi-person gaze in real time.
We propose GazeOnce, which is capable of simultaneously predicting gaze directions for multiple faces in an image.
arXiv Detail & Related papers (2022-04-20T14:21:47Z) - Gaze Estimation with an Ensemble of Four Architectures [116.53389064096139]
We train several gaze estimators adopting four different network architectures.
We select the best six estimators and ensemble their predictions through a linear combination.
The method ranks the first on the leader-board of ETH-XGaze Competition, achieving an average angular error of $3.11circ$ on the ETH-XGaze test set.
arXiv Detail & Related papers (2021-07-05T12:40:26Z) - Appearance-based Gaze Estimation With Deep Learning: A Review and Benchmark [14.306488668615883]
We present a systematic review of the appearance-based gaze estimation methods using deep learning.
We summarize the data pre-processing and post-processing methods, including face/eye detection, data rectification, 2D/3D gaze conversion and gaze origin conversion.
arXiv Detail & Related papers (2021-04-26T15:53:03Z) - On estimating gaze by self-attention augmented convolutions [6.015556590955813]
We propose a novel network architecture grounded on self-attention augmented convolutions to improve the quality of the learned features.
We dubbed our framework ARes-gaze, which explores our Attention-augmented ResNet (ARes-14) as twin convolutional backbones.
Results showed a decrease of the average angular error by 2.38% when compared to state-of-the-art methods on the MPIIFaceGaze data set, and a second-place on the EyeDiap data set.
arXiv Detail & Related papers (2020-08-25T14:29:05Z) - It's Written All Over Your Face: Full-Face Appearance-Based Gaze
Estimation [82.16380486281108]
We propose an appearance-based method that only takes the full face image as input.
Our method encodes the face image using a convolutional neural network with spatial weights applied on the feature maps.
We show that our full-face method significantly outperforms the state of the art for both 2D and 3D gaze estimation.
arXiv Detail & Related papers (2016-11-27T15:00:10Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.