MTGLS: Multi-Task Gaze Estimation with Limited Supervision
- URL: http://arxiv.org/abs/2110.12100v1
- Date: Sat, 23 Oct 2021 00:20:23 GMT
- Title: MTGLS: Multi-Task Gaze Estimation with Limited Supervision
- Authors: Shreya Ghosh, Munawar Hayat, Abhinav Dhall, Jarrod Knibbe
- Abstract summary: MTGLS: a Multi-Task Gaze estimation framework with Limited Supervision.
We propose MTGLS: a Multi-Task Gaze estimation framework with Limited Supervision.
Our proposed framework outperforms the unsupervised state-of-the-art on CAVE (by 6.43%) and even supervised state-of-the-art methods on Gaze360 (by 6.59%)
- Score: 27.57636769596276
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Robust gaze estimation is a challenging task, even for deep CNNs, due to the
non-availability of large-scale labeled data. Moreover, gaze annotation is a
time-consuming process and requires specialized hardware setups. We propose
MTGLS: a Multi-Task Gaze estimation framework with Limited Supervision, which
leverages abundantly available non-annotated facial image data. MTGLS distills
knowledge from off-the-shelf facial image analysis models, and learns strong
feature representations of human eyes, guided by three complementary auxiliary
signals: (a) the line of sight of the pupil (i.e. pseudo-gaze) defined by the
localized facial landmarks, (b) the head-pose given by Euler angles, and (c)
the orientation of the eye patch (left/right eye). To overcome inherent noise
in the supervisory signals, MTGLS further incorporates a noise distribution
modelling approach. Our experimental results show that MTGLS learns highly
generalized representations which consistently perform well on a range of
datasets. Our proposed framework outperforms the unsupervised state-of-the-art
on CAVE (by 6.43%) and even supervised state-of-the-art methods on Gaze360 (by
6.59%) datasets.
Related papers
- Mask as Supervision: Leveraging Unified Mask Information for Unsupervised 3D Pose Estimation [26.630702699374194]
We propose a unified framework that leverages mask as supervision for unsupervised 3D pose estimation.
We organize the human skeleton in a fully unsupervised way which enables the processing of annotation-free data.
Experiments demonstrate our state-of-the-art pose estimation performance on Human3.6M and MPI-INF-3DHP datasets.
arXiv Detail & Related papers (2023-12-12T08:08:34Z) - Self-Supervised Neuron Segmentation with Multi-Agent Reinforcement
Learning [53.00683059396803]
Mask image model (MIM) has been widely used due to its simplicity and effectiveness in recovering original information from masked images.
We propose a decision-based MIM that utilizes reinforcement learning (RL) to automatically search for optimal image masking ratio and masking strategy.
Our approach has a significant advantage over alternative self-supervised methods on the task of neuron segmentation.
arXiv Detail & Related papers (2023-10-06T10:40:46Z) - Improving Vision Anomaly Detection with the Guidance of Language
Modality [64.53005837237754]
This paper tackles the challenges for vision modality from a multimodal point of view.
We propose Cross-modal Guidance (CMG) to tackle the redundant information issue and sparse space issue.
To learn a more compact latent space for the vision anomaly detector, CMLE learns a correlation structure matrix from the language modality.
arXiv Detail & Related papers (2023-10-04T13:44:56Z) - NeRF-Gaze: A Head-Eye Redirection Parametric Model for Gaze Estimation [37.977032771941715]
We propose a novel Head-Eye redirection parametric model based on Neural Radiance Field.
Our model can decouple the face and eyes for separate neural rendering.
It can achieve the purpose of separately controlling the attributes of the face, identity, illumination, and eye gaze direction.
arXiv Detail & Related papers (2022-12-30T13:52:28Z) - 3DGazeNet: Generalizing Gaze Estimation with Weak-Supervision from
Synthetic Views [67.00931529296788]
We propose to train general gaze estimation models which can be directly employed in novel environments without adaptation.
We create a large-scale dataset of diverse faces with gaze pseudo-annotations, which we extract based on the 3D geometry of the scene.
We test our method in the task of gaze generalization, in which we demonstrate improvement of up to 30% compared to state-of-the-art when no ground truth data are available.
arXiv Detail & Related papers (2022-12-06T14:15:17Z) - Towards Multimodal Multitask Scene Understanding Models for Indoor
Mobile Agents [49.904531485843464]
In this paper, we discuss the main challenge: insufficient, or even no, labeled data for real-world indoor environments.
We describe MMISM (Multi-modality input Multi-task output Indoor Scene understanding Model) to tackle the above challenges.
MMISM considers RGB images as well as sparse Lidar points as inputs and 3D object detection, depth completion, human pose estimation, and semantic segmentation as output tasks.
We show that MMISM performs on par or even better than single-task models.
arXiv Detail & Related papers (2022-09-27T04:49:19Z) - LatentGaze: Cross-Domain Gaze Estimation through Gaze-Aware Analytic
Latent Code Manipulation [0.0]
We propose a gaze-aware analytic manipulation method, based on a data-driven approach with generative adversarial network inversion's disentanglement characteristics.
By utilizing GAN-based encoder-generator process, we shift the input image from the target domain to the source domain image, which a gaze estimator is sufficiently aware.
arXiv Detail & Related papers (2022-09-21T08:05:53Z) - A distribution-dependent Mumford-Shah model for unsupervised
hyperspectral image segmentation [3.2116198597240846]
We present a novel unsupervised hyperspectral segmentation framework.
It starts with a denoising and dimensionality reduction step by the well-established Minimum Noise Fraction (MNF) transform.
We equipped the MS functional with a novel robust distribution-dependent indicator function designed to handle the challenges of hyperspectral data.
arXiv Detail & Related papers (2022-03-28T19:57:14Z) - A Synthesis-Based Approach for Thermal-to-Visible Face Verification [105.63410428506536]
This paper presents an algorithm that achieves state-of-the-art performance on the ARL-VTF and TUFTS multi-spectral face datasets.
We also present MILAB-VTF(B), a challenging multi-spectral face dataset composed of paired thermal and visible videos.
arXiv Detail & Related papers (2021-08-21T17:59:56Z) - Weakly-Supervised Physically Unconstrained Gaze Estimation [80.66438763587904]
We tackle the previously unexplored problem of weakly-supervised gaze estimation from videos of human interactions.
We propose a training algorithm along with several novel loss functions especially designed for the task.
We show significant improvements in (a) the accuracy of semi-supervised gaze estimation and (b) cross-domain generalization on the state-of-the-art physically unconstrained in-the-wild Gaze360 gaze estimation benchmark.
arXiv Detail & Related papers (2021-05-20T14:58:52Z) - 360-Degree Gaze Estimation in the Wild Using Multiple Zoom Scales [26.36068336169795]
We develop a model that mimics humans' ability to estimate the gaze by aggregating from focused looks.
The model avoids the need to extract clear eye patches.
We extend the model to handle the challenging task of 360-degree gaze estimation.
arXiv Detail & Related papers (2020-09-15T08:45:12Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.