The Story in Your Eyes: An Individual-difference-aware Model for
Cross-person Gaze Estimation
- URL: http://arxiv.org/abs/2106.14183v1
- Date: Sun, 27 Jun 2021 10:14:10 GMT
- Title: The Story in Your Eyes: An Individual-difference-aware Model for
Cross-person Gaze Estimation
- Authors: Jun Bao, Buyu Liu, Jun Yu
- Abstract summary: We propose a novel method on refining cross-person gaze prediction task with eye/face images only by explicitly modelling the person-specific differences.
Specifically, we first assume that we can obtain some initial gaze prediction results with existing method, which we refer to as InitNet.
We validate our ideas on three publicly available datasets, EVE, XGaze and MPIIGaze and demonstrate that our proposed method outperforms the SOTA methods significantly.
- Score: 24.833385815585405
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: We propose a novel method on refining cross-person gaze prediction task with
eye/face images only by explicitly modelling the person-specific differences.
Specifically, we first assume that we can obtain some initial gaze prediction
results with existing method, which we refer to as InitNet, and then introduce
three modules, the Validity Module (VM), Self-Calibration (SC) and
Person-specific Transform (PT)) Module. By predicting the reliability of
current eye/face images, our VM is able to identify invalid samples, e.g. eye
blinking images, and reduce their effects in our modelling process. Our SC and
PT module then learn to compensate for the differences on valid samples only.
The former models the translation offsets by bridging the gap between initial
predictions and dataset-wise distribution. And the later learns more general
person-specific transformation by incorporating the information from existing
initial predictions of the same person. We validate our ideas on three publicly
available datasets, EVE, XGaze and MPIIGaze and demonstrate that our proposed
method outperforms the SOTA methods significantly on all of them, e.g.
respectively 21.7%, 36.0% and 32.9% relative performance improvements. We won
the GAZE 2021 Competition on the EVE dataset. Our code can be found here
https://github.com/bjj9/EVE_SCPT.
Related papers
- Stanceformer: Target-Aware Transformer for Stance Detection [59.69858080492586]
Stance Detection involves discerning the stance expressed in a text towards a specific subject or target.
Prior works have relied on existing transformer models that lack the capability to prioritize targets effectively.
We introduce Stanceformer, a target-aware transformer model that incorporates enhanced attention towards the targets during both training and inference.
arXiv Detail & Related papers (2024-10-09T17:24:28Z) - Merging Multiple Datasets for Improved Appearance-Based Gaze Estimation [10.682719521609743]
Two-stage Transformer-based Gaze-feature Fusion (TTGF) method uses transformers to merge information from each eye and the face separately and then merge across the two eyes.
Our proposed Gaze Adaptation Module (GAM) method handles annotation inconsis-tency by applying a Gaze Adaption Module for each dataset to correct gaze estimates from a single shared estimator.
arXiv Detail & Related papers (2024-09-02T02:51:40Z) - Federated Variational Inference: Towards Improved Personalization and
Generalization [2.37589914835055]
We study personalization and generalization in stateless cross-device federated learning setups.
We first propose a hierarchical generative model and formalize it using Bayesian Inference.
We then approximate this process using Variational Inference to train our model efficiently.
We evaluate our model on FEMNIST and CIFAR-100 image classification and show that FedVI beats the state-of-the-art on both tasks.
arXiv Detail & Related papers (2023-05-23T04:28:07Z) - Heterogenous Ensemble of Models for Molecular Property Prediction [55.91865861896012]
We propose a method for considering different modalities on molecules.
We ensemble these models with a HuberRegressor.
This yields a winning solution to the 2textsuperscriptnd edition of the OGB Large-Scale Challenge (2022)
arXiv Detail & Related papers (2022-11-20T17:25:26Z) - Transformers meet Stochastic Block Models: Attention with Data-Adaptive
Sparsity and Cost [53.746169882193456]
Recent works have proposed various sparse attention modules to overcome the quadratic cost of self-attention.
We propose a model that resolves both problems by endowing each attention head with a mixed-membership Block Model.
Our model outperforms previous efficient variants as well as the original Transformer with full attention.
arXiv Detail & Related papers (2022-10-27T15:30:52Z) - L2CS-Net: Fine-Grained Gaze Estimation in Unconstrained Environments [2.5234156040689237]
We propose a robust CNN-based model for predicting gaze in unconstrained settings.
We use two identical losses, one for each angle, to improve network learning and increase its generalization.
Our proposed model achieves state-of-the-art accuracy of 3.92deg and 10.41deg on MPIIGaze and Gaze360 datasets, respectively.
arXiv Detail & Related papers (2022-03-07T12:35:39Z) - Leveraging Unlabeled Data to Predict Out-of-Distribution Performance [63.740181251997306]
Real-world machine learning deployments are characterized by mismatches between the source (training) and target (test) distributions.
In this work, we investigate methods for predicting the target domain accuracy using only labeled source data and unlabeled target data.
We propose Average Thresholded Confidence (ATC), a practical method that learns a threshold on the model's confidence, predicting accuracy as the fraction of unlabeled examples.
arXiv Detail & Related papers (2022-01-11T23:01:12Z) - Self-Supervised Pre-Training for Transformer-Based Person
Re-Identification [54.55281692768765]
Transformer-based supervised pre-training achieves great performance in person re-identification (ReID)
Due to the domain gap between ImageNet and ReID datasets, it usually needs a larger pre-training dataset to boost the performance.
This work aims to mitigate the gap between the pre-training and ReID datasets from the perspective of data and model structure.
arXiv Detail & Related papers (2021-11-23T18:59:08Z) - Knowledge Generation -- Variational Bayes on Knowledge Graphs [0.685316573653194]
This thesis is a proof of concept for potential of Vari Auto-Encoder (VAE) on representation of real-world Knowledge Graphs.
Inspired by successful approaches to generation graphs, we evaluate the capabilities of our model, the Variational Auto-Encoder (RGVAE)
The RGVAE is first evaluated on link prediction. The mean reciprocal rank (MRR) scores on the two FB15K-237 and WN18RR datasets are compared.
We investigate the latent space in a twofold experiment: first, linear between the latent representation of two triples, then the exploration of each
arXiv Detail & Related papers (2021-01-21T21:23:17Z) - 360-Degree Gaze Estimation in the Wild Using Multiple Zoom Scales [26.36068336169795]
We develop a model that mimics humans' ability to estimate the gaze by aggregating from focused looks.
The model avoids the need to extract clear eye patches.
We extend the model to handle the challenging task of 360-degree gaze estimation.
arXiv Detail & Related papers (2020-09-15T08:45:12Z) - Evaluating Prediction-Time Batch Normalization for Robustness under
Covariate Shift [81.74795324629712]
We call prediction-time batch normalization, which significantly improves model accuracy and calibration under covariate shift.
We show that prediction-time batch normalization provides complementary benefits to existing state-of-the-art approaches for improving robustness.
The method has mixed results when used alongside pre-training, and does not seem to perform as well under more natural types of dataset shift.
arXiv Detail & Related papers (2020-06-19T05:08:43Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.