Adaptive Feature Fusion Network for Gaze Tracking in Mobile Tablets
- URL: http://arxiv.org/abs/2103.11119v1
- Date: Sat, 20 Mar 2021 07:16:10 GMT
- Title: Adaptive Feature Fusion Network for Gaze Tracking in Mobile Tablets
- Authors: Yiwei Bao, Yihua Cheng, Yunfei Liu and Feng Lu
- Abstract summary: We propose a novel Adaptive Feature Fusion Network (AFF-Net), which performs gaze tracking task in mobile tablets.
We use Squeeze-and-Excitation layers to adaptively fuse two-eye features according to their similarity on appearance.
Experiments on both GazeCapture and MPIIFaceGaze datasets demonstrate consistently superior performance of the proposed method.
- Score: 19.739595664816164
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Recently, many multi-stream gaze estimation methods have been proposed. They
estimate gaze from eye and face appearances and achieve reasonable accuracy.
However, most of the methods simply concatenate the features extracted from eye
and face appearance. The feature fusion process has been ignored. In this
paper, we propose a novel Adaptive Feature Fusion Network (AFF-Net), which
performs gaze tracking task in mobile tablets. We stack two-eye feature maps
and utilize Squeeze-and-Excitation layers to adaptively fuse two-eye features
according to their similarity on appearance. Meanwhile, we also propose
Adaptive Group Normalization to recalibrate eye features with the guidance of
facial feature. Extensive experiments on both GazeCapture and MPIIFaceGaze
datasets demonstrate consistently superior performance of the proposed method.
Related papers
- Merging Multiple Datasets for Improved Appearance-Based Gaze Estimation [10.682719521609743]
Two-stage Transformer-based Gaze-feature Fusion (TTGF) method uses transformers to merge information from each eye and the face separately and then merge across the two eyes.
Our proposed Gaze Adaptation Module (GAM) method handles annotation inconsis-tency by applying a Gaze Adaption Module for each dataset to correct gaze estimates from a single shared estimator.
arXiv Detail & Related papers (2024-09-02T02:51:40Z) - Harnessing Diffusion Models for Visual Perception with Meta Prompts [68.78938846041767]
We propose a simple yet effective scheme to harness a diffusion model for visual perception tasks.
We introduce learnable embeddings (meta prompts) to the pre-trained diffusion models to extract proper features for perception.
Our approach achieves new performance records in depth estimation tasks on NYU depth V2 and KITTI, and in semantic segmentation task on CityScapes.
arXiv Detail & Related papers (2023-12-22T14:40:55Z) - Rotation-Constrained Cross-View Feature Fusion for Multi-View
Appearance-based Gaze Estimation [16.43119580796718]
This work proposes a generalizable multi-view gaze estimation task and a cross-view feature fusion method to address this issue.
In addition to paired images, our method takes the relative rotation matrix between two cameras as additional input.
The proposed network learns to extract rotatable feature representation by using relative rotation as a constraint.
arXiv Detail & Related papers (2023-05-22T04:29:34Z) - Multimodal Adaptive Fusion of Face and Gait Features using Keyless
attention based Deep Neural Networks for Human Identification [67.64124512185087]
Soft biometrics such as gait are widely used with face in surveillance tasks like person recognition and re-identification.
We propose a novel adaptive multi-biometric fusion strategy for the dynamic incorporation of gait and face biometric cues by leveraging keyless attention deep neural networks.
arXiv Detail & Related papers (2023-03-24T05:28:35Z) - Active Gaze Control for Foveal Scene Exploration [124.11737060344052]
We propose a methodology to emulate how humans and robots with foveal cameras would explore a scene.
The proposed method achieves an increase in detection F1-score of 2-3 percentage points for the same number of gaze shifts.
arXiv Detail & Related papers (2022-08-24T14:59:28Z) - L2CS-Net: Fine-Grained Gaze Estimation in Unconstrained Environments [2.5234156040689237]
We propose a robust CNN-based model for predicting gaze in unconstrained settings.
We use two identical losses, one for each angle, to improve network learning and increase its generalization.
Our proposed model achieves state-of-the-art accuracy of 3.92deg and 10.41deg on MPIIGaze and Gaze360 datasets, respectively.
arXiv Detail & Related papers (2022-03-07T12:35:39Z) - Correlation-Aware Deep Tracking [83.51092789908677]
We propose a novel target-dependent feature network inspired by the self-/cross-attention scheme.
Our network deeply embeds cross-image feature correlation in multiple layers of the feature network.
Our model can be flexibly pre-trained on abundant unpaired images, leading to notably faster convergence than the existing methods.
arXiv Detail & Related papers (2022-03-03T11:53:54Z) - Stochastic Layers in Vision Transformers [85.38733795180497]
We introduce fully layers in vision transformers, without causing any severe drop in performance.
The additionality boosts the robustness of visual features and strengthens privacy.
We use our features for three different applications, namely, adversarial robustness, network calibration, and feature privacy.
arXiv Detail & Related papers (2021-12-30T16:07:59Z) - Self-Learning Transformations for Improving Gaze and Head Redirection [49.61091281780071]
We propose a novel generative model for images of faces, that is capable of producing high-quality images under fine-grained control over eye gaze and head orientation angles.
This requires the disentangling of many appearance related factors including gaze and head orientation but also lighting, hue etc.
We show that explicitly disentangling task-irrelevant factors results in more accurate modelling of gaze and head orientation.
arXiv Detail & Related papers (2020-10-23T11:18:37Z) - A Coarse-to-Fine Adaptive Network for Appearance-Based Gaze Estimation [24.8796573846653]
We propose a coarse-to-fine strategy which estimates a basic gaze direction from face image and refines it with corresponding residual predicted from eye images.
We construct a coarse-to-fine adaptive network named CA-Net and achieve state-of-the-art performances on MPIIGaze and EyeDiap.
arXiv Detail & Related papers (2020-01-01T10:39:03Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.