Weakly-Supervised Physically Unconstrained Gaze Estimation
- URL: http://arxiv.org/abs/2105.09803v1
- Date: Thu, 20 May 2021 14:58:52 GMT
- Title: Weakly-Supervised Physically Unconstrained Gaze Estimation
- Authors: Rakshit Kothari, Shalini De Mello, Umar Iqbal, Wonmin Byeon, Seonwook
Park, Jan Kautz
- Abstract summary: We tackle the previously unexplored problem of weakly-supervised gaze estimation from videos of human interactions.
We propose a training algorithm along with several novel loss functions especially designed for the task.
We show significant improvements in (a) the accuracy of semi-supervised gaze estimation and (b) cross-domain generalization on the state-of-the-art physically unconstrained in-the-wild Gaze360 gaze estimation benchmark.
- Score: 80.66438763587904
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: A major challenge for physically unconstrained gaze estimation is acquiring
training data with 3D gaze annotations for in-the-wild and outdoor scenarios.
In contrast, videos of human interactions in unconstrained environments are
abundantly available and can be much more easily annotated with frame-level
activity labels. In this work, we tackle the previously unexplored problem of
weakly-supervised gaze estimation from videos of human interactions. We
leverage the insight that strong gaze-related geometric constraints exist when
people perform the activity of "looking at each other" (LAEO). To acquire
viable 3D gaze supervision from LAEO labels, we propose a training algorithm
along with several novel loss functions especially designed for the task. With
weak supervision from two large scale CMU-Panoptic and AVA-LAEO activity
datasets, we show significant improvements in (a) the accuracy of
semi-supervised gaze estimation and (b) cross-domain generalization on the
state-of-the-art physically unconstrained in-the-wild Gaze360 gaze estimation
benchmark. We open source our code at
https://github.com/NVlabs/weakly-supervised-gaze.
Related papers
- Learning Unsupervised Gaze Representation via Eye Mask Driven Information Bottleneck [36.255590251433844]
This work proposes a novel unsupervised/self-supervised gaze pre-training framework.
It forces the full-face branch to learn a low dimensional gaze embedding without gaze annotations, through collaborative feature contrast and squeeze modules.
In the heart of this framework is an alternating eye-attended/unattended masking training scheme, which squeezes gaze-related information from full-face branch into an eye-masked auto-encoder.
arXiv Detail & Related papers (2024-06-29T04:35:08Z) - 3DGazeNet: Generalizing Gaze Estimation with Weak-Supervision from
Synthetic Views [67.00931529296788]
We propose to train general gaze estimation models which can be directly employed in novel environments without adaptation.
We create a large-scale dataset of diverse faces with gaze pseudo-annotations, which we extract based on the 3D geometry of the scene.
We test our method in the task of gaze generalization, in which we demonstrate improvement of up to 30% compared to state-of-the-art when no ground truth data are available.
arXiv Detail & Related papers (2022-12-06T14:15:17Z) - Active Gaze Control for Foveal Scene Exploration [124.11737060344052]
We propose a methodology to emulate how humans and robots with foveal cameras would explore a scene.
The proposed method achieves an increase in detection F1-score of 2-3 percentage points for the same number of gaze shifts.
arXiv Detail & Related papers (2022-08-24T14:59:28Z) - On Triangulation as a Form of Self-Supervision for 3D Human Pose
Estimation [57.766049538913926]
Supervised approaches to 3D pose estimation from single images are remarkably effective when labeled data is abundant.
Much of the recent attention has shifted towards semi and (or) weakly supervised learning.
We propose to impose multi-view geometrical constraints by means of a differentiable triangulation and to use it as form of self-supervision during training when no labels are available.
arXiv Detail & Related papers (2022-03-29T19:11:54Z) - MTGLS: Multi-Task Gaze Estimation with Limited Supervision [27.57636769596276]
MTGLS: a Multi-Task Gaze estimation framework with Limited Supervision.
We propose MTGLS: a Multi-Task Gaze estimation framework with Limited Supervision.
Our proposed framework outperforms the unsupervised state-of-the-art on CAVE (by 6.43%) and even supervised state-of-the-art methods on Gaze360 (by 6.59%)
arXiv Detail & Related papers (2021-10-23T00:20:23Z) - TRiPOD: Human Trajectory and Pose Dynamics Forecasting in the Wild [77.59069361196404]
TRiPOD is a novel method for predicting body dynamics based on graph attentional networks.
To incorporate a real-world challenge, we learn an indicator representing whether an estimated body joint is visible/invisible at each frame.
Our evaluation shows that TRiPOD outperforms all prior work and state-of-the-art specifically designed for each of the trajectory and pose forecasting tasks.
arXiv Detail & Related papers (2021-04-08T20:01:00Z) - Integrating Human Gaze into Attention for Egocentric Activity
Recognition [40.517438760096056]
We introduce an effective probabilistic approach to integrate human gaze intotemporal attention for egocentric activity recognition.
We represent the locations gaze fixation points as structured discrete latent variables to model their uncertainties.
The predicted gaze locations are used to provide informative attentional cues to improve the recognition performance.
arXiv Detail & Related papers (2020-11-08T08:02:30Z) - 360-Degree Gaze Estimation in the Wild Using Multiple Zoom Scales [26.36068336169795]
We develop a model that mimics humans' ability to estimate the gaze by aggregating from focused looks.
The model avoids the need to extract clear eye patches.
We extend the model to handle the challenging task of 360-degree gaze estimation.
arXiv Detail & Related papers (2020-09-15T08:45:12Z) - Kinematic-Structure-Preserved Representation for Unsupervised 3D Human
Pose Estimation [58.72192168935338]
Generalizability of human pose estimation models developed using supervision on large-scale in-studio datasets remains questionable.
We propose a novel kinematic-structure-preserved unsupervised 3D pose estimation framework, which is not restrained by any paired or unpaired weak supervisions.
Our proposed model employs three consecutive differentiable transformations named as forward-kinematics, camera-projection and spatial-map transformation.
arXiv Detail & Related papers (2020-06-24T23:56:33Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.