DMAGaze: Gaze Estimation Based on Feature Disentanglement and Multi-Scale Attention
- URL: http://arxiv.org/abs/2504.11160v1
- Date: Tue, 15 Apr 2025 13:08:43 GMT
- Title: DMAGaze: Gaze Estimation Based on Feature Disentanglement and Multi-Scale Attention
- Authors: Haohan Chen, Hongjia Liu, Shiyong Lan, Wenwu Wang, Yixin Qiao, Yao Li, Guonan Deng,
- Abstract summary: DMAGaze is a novel gaze estimation framework that exploits information from facial images in three aspects: gaze-relevant global features, local eye features, and head pose estimation features.<n>Our proposed DMAGaze has been extensively validated on two mainstream public datasets, achieving state-of-the-art performance.
- Score: 9.138316514656958
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Gaze estimation, which predicts gaze direction, commonly faces the challenge of interference from complex gaze-irrelevant information in face images. In this work, we propose DMAGaze, a novel gaze estimation framework that exploits information from facial images in three aspects: gaze-relevant global features (disentangled from facial image), local eye features (extracted from cropped eye patch), and head pose estimation features, to improve overall performance. Firstly, we design a new continuous mask-based Disentangler to accurately disentangle gaze-relevant and gaze-irrelevant information in facial images by achieving the dual-branch disentanglement goal through separately reconstructing the eye and non-eye regions. Furthermore, we introduce a new cascaded attention module named Multi-Scale Global Local Attention Module (MS-GLAM). Through a customized cascaded attention structure, it effectively focuses on global and local information at multiple scales, further enhancing the information from the Disentangler. Finally, the global gaze-relevant features disentangled by the upper face branch, combined with head pose and local eye features, are passed through the detection head for high-precision gaze estimation. Our proposed DMAGaze has been extensively validated on two mainstream public datasets, achieving state-of-the-art performance.
Related papers
- Learning Unsupervised Gaze Representation via Eye Mask Driven Information Bottleneck [36.255590251433844]
This work proposes a novel unsupervised/self-supervised gaze pre-training framework.
It forces the full-face branch to learn a low dimensional gaze embedding without gaze annotations, through collaborative feature contrast and squeeze modules.
In the heart of this framework is an alternating eye-attended/unattended masking training scheme, which squeezes gaze-related information from full-face branch into an eye-masked auto-encoder.
arXiv Detail & Related papers (2024-06-29T04:35:08Z) - Learning to Fuse Monocular and Multi-view Cues for Multi-frame Depth
Estimation in Dynamic Scenes [51.20150148066458]
We propose a novel method to learn to fuse the multi-view and monocular cues encoded as volumes without needing the generalizationally crafted masks.
Experiments on real-world datasets prove the significant effectiveness and ability of the proposed method.
arXiv Detail & Related papers (2023-04-18T13:55:24Z) - 3DGazeNet: Generalizing Gaze Estimation with Weak-Supervision from
Synthetic Views [67.00931529296788]
We propose to train general gaze estimation models which can be directly employed in novel environments without adaptation.
We create a large-scale dataset of diverse faces with gaze pseudo-annotations, which we extract based on the 3D geometry of the scene.
We test our method in the task of gaze generalization, in which we demonstrate improvement of up to 30% compared to state-of-the-art when no ground truth data are available.
arXiv Detail & Related papers (2022-12-06T14:15:17Z) - Active Gaze Control for Foveal Scene Exploration [124.11737060344052]
We propose a methodology to emulate how humans and robots with foveal cameras would explore a scene.
The proposed method achieves an increase in detection F1-score of 2-3 percentage points for the same number of gaze shifts.
arXiv Detail & Related papers (2022-08-24T14:59:28Z) - Weakly-Supervised Physically Unconstrained Gaze Estimation [80.66438763587904]
We tackle the previously unexplored problem of weakly-supervised gaze estimation from videos of human interactions.
We propose a training algorithm along with several novel loss functions especially designed for the task.
We show significant improvements in (a) the accuracy of semi-supervised gaze estimation and (b) cross-domain generalization on the state-of-the-art physically unconstrained in-the-wild Gaze360 gaze estimation benchmark.
arXiv Detail & Related papers (2021-05-20T14:58:52Z) - Goal-Oriented Gaze Estimation for Zero-Shot Learning [62.52340838817908]
We introduce a novel goal-oriented gaze estimation module (GEM) to improve the discriminative attribute localization.
We aim to predict the actual human gaze location to get the visual attention regions for recognizing a novel object guided by attribute description.
This work implies the promising benefits of collecting human gaze dataset and automatic gaze estimation algorithms on high-level computer vision tasks.
arXiv Detail & Related papers (2021-03-05T02:14:57Z) - LNSMM: Eye Gaze Estimation With Local Network Share Multiview Multitask [7.065909514483728]
We propose a novel methodology to estimate eye gaze points and eye gaze directions simultaneously.
The experiment show our method is state-of-the-art the current mainstream methods on two indicators of gaze points and gaze directions.
arXiv Detail & Related papers (2021-01-18T15:14:24Z) - Integrating Human Gaze into Attention for Egocentric Activity
Recognition [40.517438760096056]
We introduce an effective probabilistic approach to integrate human gaze intotemporal attention for egocentric activity recognition.
We represent the locations gaze fixation points as structured discrete latent variables to model their uncertainties.
The predicted gaze locations are used to provide informative attentional cues to improve the recognition performance.
arXiv Detail & Related papers (2020-11-08T08:02:30Z) - 360-Degree Gaze Estimation in the Wild Using Multiple Zoom Scales [26.36068336169795]
We develop a model that mimics humans' ability to estimate the gaze by aggregating from focused looks.
The model avoids the need to extract clear eye patches.
We extend the model to handle the challenging task of 360-degree gaze estimation.
arXiv Detail & Related papers (2020-09-15T08:45:12Z) - It's Written All Over Your Face: Full-Face Appearance-Based Gaze
Estimation [82.16380486281108]
We propose an appearance-based method that only takes the full face image as input.
Our method encodes the face image using a convolutional neural network with spatial weights applied on the feature maps.
We show that our full-face method significantly outperforms the state of the art for both 2D and 3D gaze estimation.
arXiv Detail & Related papers (2016-11-27T15:00:10Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.