Boosting Image-based Mutual Gaze Detection using Pseudo 3D Gaze
- URL: http://arxiv.org/abs/2010.07811v2
- Date: Tue, 22 Dec 2020 17:20:59 GMT
- Title: Boosting Image-based Mutual Gaze Detection using Pseudo 3D Gaze
- Authors: Bardia Doosti, Ching-Hui Chen, Raviteja Vemulapalli, Xuhui Jia, Yukun
Zhu, Bradley Green
- Abstract summary: Mutual gaze detection plays an important role in understanding human interactions.
We propose a simple and effective approach to boost the performance by using an auxiliary 3D gaze estimation task during the training phase.
We achieve the performance boost without additional labeling cost by training the 3D gaze estimation branch using pseudo 3D gaze labels deduced from mutual gaze labels.
- Score: 19.10872208787867
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Mutual gaze detection, i.e., predicting whether or not two people are looking
at each other, plays an important role in understanding human interactions. In
this work, we focus on the task of image-based mutual gaze detection, and
propose a simple and effective approach to boost the performance by using an
auxiliary 3D gaze estimation task during the training phase. We achieve the
performance boost without additional labeling cost by training the 3D gaze
estimation branch using pseudo 3D gaze labels deduced from mutual gaze labels.
By sharing the head image encoder between the 3D gaze estimation and the mutual
gaze detection branches, we achieve better head features than learned by
training the mutual gaze detection branch alone. Experimental results on three
image datasets show that the proposed approach improves the detection
performance significantly without additional annotations. This work also
introduces a new image dataset that consists of 33.1K pairs of humans annotated
with mutual gaze labels in 29.2K images.
Related papers
- Active Gaze Control for Foveal Scene Exploration [124.11737060344052]
We propose a methodology to emulate how humans and robots with foveal cameras would explore a scene.
The proposed method achieves an increase in detection F1-score of 2-3 percentage points for the same number of gaze shifts.
arXiv Detail & Related papers (2022-08-24T14:59:28Z) - An Empirical Study of Pseudo-Labeling for Image-based 3D Object
Detection [72.30883544352918]
We investigate whether pseudo-labels can provide effective supervision for the baseline models under varying settings.
We achieve 20.23 AP for moderate level on the KITTI-3D testing set without bells and whistles, improving the baseline model by 6.03 AP.
We hope this work can provide insights for the image-based 3D detection community under a semi-supervised setting.
arXiv Detail & Related papers (2022-08-15T12:17:46Z) - GazeOnce: Real-Time Multi-Person Gaze Estimation [18.16091280655655]
Appearance-based gaze estimation aims to predict the 3D eye gaze direction from a single image.
Recent deep learning-based approaches have demonstrated excellent performance, but cannot output multi-person gaze in real time.
We propose GazeOnce, which is capable of simultaneously predicting gaze directions for multiple faces in an image.
arXiv Detail & Related papers (2022-04-20T14:21:47Z) - Learning Hierarchical Graph Representation for Image Manipulation
Detection [50.04902159383709]
The objective of image manipulation detection is to identify and locate the manipulated regions in the images.
Recent approaches mostly adopt the sophisticated Convolutional Neural Networks (CNNs) to capture the tampering artifacts left in the images.
We propose a hierarchical Graph Convolutional Network (HGCN-Net), which consists of two parallel branches.
arXiv Detail & Related papers (2022-01-15T01:54:25Z) - Unsupervised View-Invariant Human Posture Representation [28.840986167408037]
We present a novel unsupervised approach that learns to extract view-invariant 3D human pose representation from a 2D image.
Our model is trained by exploiting the intrinsic view-invariant properties of human pose between simultaneous frames.
We show improvements on the state-of-the-art unsupervised cross-view action classification accuracy on RGB and depth images.
arXiv Detail & Related papers (2021-09-17T19:23:31Z) - Learning to Disambiguate Strongly Interacting Hands via Probabilistic
Per-pixel Part Segmentation [84.28064034301445]
Self-similarity, and the resulting ambiguities in assigning pixel observations to the respective hands, is a major cause of the final 3D pose error.
We propose DIGIT, a novel method for estimating the 3D poses of two interacting hands from a single monocular image.
We experimentally show that the proposed approach achieves new state-of-the-art performance on the InterHand2.6M dataset.
arXiv Detail & Related papers (2021-07-01T13:28:02Z) - Weakly-Supervised Physically Unconstrained Gaze Estimation [80.66438763587904]
We tackle the previously unexplored problem of weakly-supervised gaze estimation from videos of human interactions.
We propose a training algorithm along with several novel loss functions especially designed for the task.
We show significant improvements in (a) the accuracy of semi-supervised gaze estimation and (b) cross-domain generalization on the state-of-the-art physically unconstrained in-the-wild Gaze360 gaze estimation benchmark.
arXiv Detail & Related papers (2021-05-20T14:58:52Z) - Controllable Continuous Gaze Redirection [47.15883248953411]
We present interpGaze, a novel framework for controllable gaze redirection.
Our goal is to redirect the eye gaze of one person into any gaze direction depicted in the reference image.
The proposed interpGaze outperforms state-of-the-art methods in terms of image quality and redirection precision.
arXiv Detail & Related papers (2020-10-09T11:50:06Z) - A Graph-based Interactive Reasoning for Human-Object Interaction
Detection [71.50535113279551]
We present a novel graph-based interactive reasoning model called Interactive Graph (abbr. in-Graph) to infer HOIs.
We construct a new framework to assemble in-Graph models for detecting HOIs, namely in-GraphNet.
Our framework is end-to-end trainable and free from costly annotations like human pose.
arXiv Detail & Related papers (2020-07-14T09:29:03Z) - Efficiently Guiding Imitation Learning Agents with Human Gaze [28.7222865388462]
We use gaze cues from human demonstrators to enhance the performance of agents trained via three popular imitation learning methods.
Based on similarities between the attention of reinforcement learning agents and human gaze, we propose a novel approach for utilizing gaze data in a computationally efficient manner.
Our proposed approach improves the performance by 95% for BC, 343% for BCO, and 390% for T-REX, averaged over 20 different Atari games.
arXiv Detail & Related papers (2020-02-28T00:55:30Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.