MGTR: End-to-End Mutual Gaze Detection with Transformer
- URL: http://arxiv.org/abs/2209.10930v1
- Date: Thu, 22 Sep 2022 11:26:22 GMT
- Title: MGTR: End-to-End Mutual Gaze Detection with Transformer
- Authors: Hang Guo, Zhengxi Hu, Jingtai Liu
- Abstract summary: We propose a novel one-stage mutual gaze detection framework called Mutual Gaze TRansformer or MGTR.
By designing mutual gaze instance triples, MGTR can detect each human head bounding box and simultaneously infer mutual gaze relationship based on global image information.
Experimental results on two mutual gaze datasets show that our method is able to accelerate mutual gaze detection process without losing performance.
- Score: 1.0312968200748118
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: People's looking at each other or mutual gaze is ubiquitous in our daily
interactions, and detecting mutual gaze is of great significance for
understanding human social scenes. Current mutual gaze detection methods focus
on two-stage methods, whose inference speed is limited by the two-stage
pipeline and the performance in the second stage is affected by the first one.
In this paper, we propose a novel one-stage mutual gaze detection framework
called Mutual Gaze TRansformer or MGTR to perform mutual gaze detection in an
end-to-end manner. By designing mutual gaze instance triples, MGTR can detect
each human head bounding box and simultaneously infer mutual gaze relationship
based on global image information, which streamlines the whole process with
simplicity. Experimental results on two mutual gaze datasets show that our
method is able to accelerate mutual gaze detection process without losing
performance. Ablation study shows that different components of MGTR can capture
different levels of semantic information in images. Code is available at
https://github.com/Gmbition/MGTR
Related papers
- Merging Multiple Datasets for Improved Appearance-Based Gaze Estimation [10.682719521609743]
Two-stage Transformer-based Gaze-feature Fusion (TTGF) method uses transformers to merge information from each eye and the face separately and then merge across the two eyes.
Our proposed Gaze Adaptation Module (GAM) method handles annotation inconsis-tency by applying a Gaze Adaption Module for each dataset to correct gaze estimates from a single shared estimator.
arXiv Detail & Related papers (2024-09-02T02:51:40Z) - Disentangled Interaction Representation for One-Stage Human-Object
Interaction Detection [70.96299509159981]
Human-Object Interaction (HOI) detection is a core task for human-centric image understanding.
Recent one-stage methods adopt a transformer decoder to collect image-wide cues that are useful for interaction prediction.
Traditional two-stage methods benefit significantly from their ability to compose interaction features in a disentangled and explainable manner.
arXiv Detail & Related papers (2023-12-04T08:02:59Z) - Joint Gaze-Location and Gaze-Object Detection [62.69261709635086]
Current approaches frame gaze location detection (GL-D) and gaze object detection (GO-D) as two separate tasks.
We propose GTR, short for underlineGaze following detection underlineTRansformer, to streamline the gaze following detection pipeline.
GTR achieves a 12.1 mAP gain on GazeFollowing and a 18.2 mAP gain on VideoAttentionTarget for GL-D, as well as a 19 mAP improvement on GOO-Real for GO-D.
arXiv Detail & Related papers (2023-08-26T12:12:24Z) - Active Gaze Control for Foveal Scene Exploration [124.11737060344052]
We propose a methodology to emulate how humans and robots with foveal cameras would explore a scene.
The proposed method achieves an increase in detection F1-score of 2-3 percentage points for the same number of gaze shifts.
arXiv Detail & Related papers (2022-08-24T14:59:28Z) - End-to-End Human-Gaze-Target Detection with Transformers [57.00864538284686]
We propose an effective and efficient method for Human-Gaze-Target (HGT) detection, i.e., gaze following.
Our method, named Human-Gaze-Target detection TRansformer or HGTTR, streamlines the HGT detection pipeline by eliminating all other components.
The effectiveness and robustness of our proposed method are verified with extensive experiments on the two standard benchmark datasets, GazeFollowing and VideoAttentionTarget.
arXiv Detail & Related papers (2022-03-20T02:37:06Z) - Glance and Gaze: Inferring Action-aware Points for One-Stage
Human-Object Interaction Detection [81.32280287658486]
We propose a novel one-stage method, namely Glance and Gaze Network (GGNet)
GGNet adaptively models a set of actionaware points (ActPoints) via glance and gaze steps.
We design an actionaware approach that effectively matches each detected interaction with its associated human-object pair.
arXiv Detail & Related papers (2021-04-12T08:01:04Z) - Boosting Image-based Mutual Gaze Detection using Pseudo 3D Gaze [19.10872208787867]
Mutual gaze detection plays an important role in understanding human interactions.
We propose a simple and effective approach to boost the performance by using an auxiliary 3D gaze estimation task during the training phase.
We achieve the performance boost without additional labeling cost by training the 3D gaze estimation branch using pseudo 3D gaze labels deduced from mutual gaze labels.
arXiv Detail & Related papers (2020-10-15T15:01:41Z) - GID-Net: Detecting Human-Object Interaction with Global and Instance
Dependency [67.95192190179975]
We introduce a two-stage trainable reasoning mechanism, referred to as GID block.
GID-Net is a human-object interaction detection framework consisting of a human branch, an object branch and an interaction branch.
We have compared our proposed GID-Net with existing state-of-the-art methods on two public benchmarks, including V-COCO and HICO-DET.
arXiv Detail & Related papers (2020-03-11T11:58:43Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.