Related papers: Hybrid-Domain Adaptative Representation Learning for Gaze Estimation

Hybrid-Domain Adaptative Representation Learning for Gaze Estimation

URL: http://arxiv.org/abs/2511.13222v1
Date: Mon, 17 Nov 2025 10:38:50 GMT
Title: Hybrid-Domain Adaptative Representation Learning for Gaze Estimation
Authors: Qida Tan, Hongyu Yang, Wenchao Du,
Abstract summary: We present a novel Hybrid-domain Adaptative Representation Learning framework to learn robust gaze representation.<n>We propose to disentangle gaze-relevant representation from low-quality facial images by aligning features extracted from high-quality near-eye images.<n>Experiments on EyeDiap, MPIIFaceGaze, and Gaze360 datasets demonstrate that our approach achieves state-of-the-art accuracy.
Score: 20.422491630669885
License: http://creativecommons.org/licenses/by-nc-sa/4.0/
Abstract: Appearance-based gaze estimation, aiming to predict accurate 3D gaze direction from a single facial image, has made promising progress in recent years. However, most methods suffer significant performance degradation in cross-domain evaluation due to interference from gaze-irrelevant factors, such as expressions, wearables, and image quality. To alleviate this problem, we present a novel Hybrid-domain Adaptative Representation Learning (shorted by HARL) framework that exploits multi-source hybrid datasets to learn robust gaze representation. More specifically, we propose to disentangle gaze-relevant representation from low-quality facial images by aligning features extracted from high-quality near-eye images in an unsupervised domain-adaptation manner, which hardly requires any computational or inference costs. Additionally, we analyze the effect of head-pose and design a simple yet efficient sparse graph fusion module to explore the geometric constraint between gaze direction and head-pose, leading to a dense and robust gaze representation. Extensive experiments on EyeDiap, MPIIFaceGaze, and Gaze360 datasets demonstrate that our approach achieves state-of-the-art accuracy of $\textbf{5.02}^{\circ}$ and $\textbf{3.36}^{\circ}$, and $\textbf{9.26}^{\circ}$ respectively, and present competitive performances through cross-dataset evaluation. The code is available at https://github.com/da60266/HARL.

Related papers

OmniGaze: Reward-inspired Generalizable Gaze Estimation In The Wild [104.57404324262556]
Current 3D gaze estimation methods struggle to generalize across diverse data domains.<n>We present OmniGaze, a semi-supervised framework for 3D gaze estimation.<n>We show that OmniGaze achieves state-of-the-art performance on five datasets.
arXiv Detail & Related papers (2025-10-15T15:19:52Z)
GenFace: A Large-Scale Fine-Grained Face Forgery Benchmark and Cross Appearance-Edge Learning [50.7702397913573]
The rapid advancement of photorealistic generators has reached a critical juncture where the discrepancy between authentic and manipulated images is increasingly indistinguishable. Although there have been a number of publicly available face forgery datasets, the forgery faces are mostly generated using GAN-based synthesis technology. We propose a large-scale, diverse, and fine-grained high-fidelity dataset, namely GenFace, to facilitate the advancement of deepfake detection.
arXiv Detail & Related papers (2024-02-03T03:13:50Z)
BOURNE: Bootstrapped Self-supervised Learning Framework for Unified Graph Anomaly Detection [50.26074811655596]
We propose a novel unified graph anomaly detection framework based on bootstrapped self-supervised learning (named BOURNE) By swapping the context embeddings between nodes and edges, we enable the mutual detection of node and edge anomalies. BOURNE can eliminate the need for negative sampling, thereby enhancing its efficiency in handling large graphs.
arXiv Detail & Related papers (2023-07-28T00:44:57Z)
Domain-Adaptive Full-Face Gaze Estimation via Novel-View-Synthesis and Feature Disentanglement [12.857137513211866]
We propose an effective model training pipeline consisting of a training data synthesis and a gaze estimation model for unsupervised domain adaptation. The proposed data synthesis leverages the single-image 3D reconstruction to expand the range of the head poses from the source domain without requiring a 3D facial shape dataset. We propose a disentangling autoencoder network to separate gaze-related features and introduce background augmentation consistency loss to utilize the characteristics of the synthetic source domain.
arXiv Detail & Related papers (2023-05-25T15:15:03Z)
Explicit Correspondence Matching for Generalizable Neural Radiance Fields [66.99907718824782]
We present a new NeRF method that is able to generalize to new unseen scenarios and perform novel view synthesis with as few as two source views.<n>The explicit correspondence matching is quantified with the cosine similarity between image features sampled at the 2D projections of a 3D point on different views.<n>Our method achieves state-of-the-art results on different evaluation settings, with the experiments showing a strong correlation between our learned cosine feature similarity and volume density.
arXiv Detail & Related papers (2023-04-24T17:46:01Z)
3DGazeNet: Generalizing Gaze Estimation with Weak-Supervision from Synthetic Views [67.00931529296788]
We propose to train general gaze estimation models which can be directly employed in novel environments without adaptation. We create a large-scale dataset of diverse faces with gaze pseudo-annotations, which we extract based on the 3D geometry of the scene. We test our method in the task of gaze generalization, in which we demonstrate improvement of up to 30% compared to state-of-the-art when no ground truth data are available.
arXiv Detail & Related papers (2022-12-06T14:15:17Z)
Jitter Does Matter: Adapting Gaze Estimation to New Domains [12.482427155726413]
We propose to utilize gaze jitter to analyze and optimize gaze domain adaptation task. We find that the high-frequency component (HFC) is an important factor that leads to jitter. We employ contrastive learning to encourage the model to obtain similar representations between original and perturbed data.
arXiv Detail & Related papers (2022-10-05T08:20:41Z)
LatentGaze: Cross-Domain Gaze Estimation through Gaze-Aware Analytic Latent Code Manipulation [0.0]
We propose a gaze-aware analytic manipulation method, based on a data-driven approach with generative adversarial network inversion's disentanglement characteristics. By utilizing GAN-based encoder-generator process, we shift the input image from the target domain to the source domain image, which a gaze estimator is sufficiently aware.
arXiv Detail & Related papers (2022-09-21T08:05:53Z)
Active Gaze Control for Foveal Scene Exploration [124.11737060344052]
We propose a methodology to emulate how humans and robots with foveal cameras would explore a scene. The proposed method achieves an increase in detection F1-score of 2-3 percentage points for the same number of gaze shifts.
arXiv Detail & Related papers (2022-08-24T14:59:28Z)
GazeOnce: Real-Time Multi-Person Gaze Estimation [18.16091280655655]
Appearance-based gaze estimation aims to predict the 3D eye gaze direction from a single image. Recent deep learning-based approaches have demonstrated excellent performance, but cannot output multi-person gaze in real time. We propose GazeOnce, which is capable of simultaneously predicting gaze directions for multiple faces in an image.
arXiv Detail & Related papers (2022-04-20T14:21:47Z)
Boosting Image-based Mutual Gaze Detection using Pseudo 3D Gaze [19.10872208787867]
Mutual gaze detection plays an important role in understanding human interactions. We propose a simple and effective approach to boost the performance by using an auxiliary 3D gaze estimation task during the training phase. We achieve the performance boost without additional labeling cost by training the 3D gaze estimation branch using pseudo 3D gaze labels deduced from mutual gaze labels.
arXiv Detail & Related papers (2020-10-15T15:01:41Z)
360-Degree Gaze Estimation in the Wild Using Multiple Zoom Scales [26.36068336169795]
We develop a model that mimics humans' ability to estimate the gaze by aggregating from focused looks. The model avoids the need to extract clear eye patches. We extend the model to handle the challenging task of 360-degree gaze estimation.
arXiv Detail & Related papers (2020-09-15T08:45:12Z)

This list is automatically generated from the titles and abstracts of the papers in this site.