End-to-End Human-Gaze-Target Detection with Transformers
- URL: http://arxiv.org/abs/2203.10433v2
- Date: Thu, 24 Mar 2022 02:10:11 GMT
- Title: End-to-End Human-Gaze-Target Detection with Transformers
- Authors: Danyang Tu and Xiongkuo Min and Huiyu Duan and Guodong Guo and
Guangtao Zhai and Wei Shen
- Abstract summary: We propose an effective and efficient method for Human-Gaze-Target (HGT) detection, i.e., gaze following.
Our method, named Human-Gaze-Target detection TRansformer or HGTTR, streamlines the HGT detection pipeline by eliminating all other components.
The effectiveness and robustness of our proposed method are verified with extensive experiments on the two standard benchmark datasets, GazeFollowing and VideoAttentionTarget.
- Score: 57.00864538284686
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: In this paper, we propose an effective and efficient method for
Human-Gaze-Target (HGT) detection, i.e., gaze following. Current approaches
decouple the HGT detection task into separate branches of salient object
detection and human gaze prediction, employing a two-stage framework where
human head locations must first be detected and then be fed into the next gaze
target prediction sub-network. In contrast, we redefine the HGT detection task
as detecting human head locations and their gaze targets, simultaneously. By
this way, our method, named Human-Gaze-Target detection TRansformer or HGTTR,
streamlines the HGT detection pipeline by eliminating all other additional
components. HGTTR reasons about the relations of salient objects and human gaze
from the global image context. Moreover, unlike existing two-stage methods that
require human head locations as input and can predict only one human's gaze
target at a time, HGTTR can directly predict the locations of all people and
their gaze targets at one time in an end-to-end manner. The effectiveness and
robustness of our proposed method are verified with extensive experiments on
the two standard benchmark datasets, GazeFollowing and VideoAttentionTarget.
Without bells and whistles, HGTTR outperforms existing state-of-the-art methods
by large margins (6.4 mAP gain on GazeFollowing and 10.3 mAP gain on
VideoAttentionTarget) with a much simpler architecture.
Related papers
- GazeHTA: End-to-end Gaze Target Detection with Head-Target Association [12.38704128536528]
We propose an end-to-end approach for gaze target detection: predicting a head-target connection between individuals and the target image regions they are looking at.
Most of the existing methods use independent components such as off-the-shelf head detectors or have problems in establishing associations between heads and gaze targets.
We investigate an end-to-end multi-person Gaze target detection framework with Heads and Targets Association (GazeHTA), which predicts multiple head-target instances based solely on input scene image.
arXiv Detail & Related papers (2024-04-16T16:51:27Z) - Exploring Hyperspectral Anomaly Detection with Human Vision: A Small
Target Aware Detector [20.845503528474328]
Hyperspectral anomaly detection (HAD) aims to localize pixel points whose spectral features differ from the background.
Existing HAD methods aim to objectively detect and distinguish background and anomalous spectra.
In this paper, we analyze hyperspectral image (HSI) features under human visual perception.
We propose a small target aware detector (STAD), which introduces saliency maps to capture HSI features closer to human visual perception.
arXiv Detail & Related papers (2024-01-02T08:28:38Z) - Joint Gaze-Location and Gaze-Object Detection [62.69261709635086]
Current approaches frame gaze location detection (GL-D) and gaze object detection (GO-D) as two separate tasks.
We propose GTR, short for underlineGaze following detection underlineTRansformer, to streamline the gaze following detection pipeline.
GTR achieves a 12.1 mAP gain on GazeFollowing and a 18.2 mAP gain on VideoAttentionTarget for GL-D, as well as a 19 mAP improvement on GOO-Real for GO-D.
arXiv Detail & Related papers (2023-08-26T12:12:24Z) - Object-aware Gaze Target Detection [14.587595325977583]
This paper proposes a Transformer-based architecture that automatically detects objects in the scene to build associations between every head and the gazed-head/object.
Our method achieves state-of-the-art results on all metrics for gaze target detection and 11-13% improvement in average precision for the classification and the localization of the gazed-objects.
arXiv Detail & Related papers (2023-07-18T22:04:41Z) - MGTR: End-to-End Mutual Gaze Detection with Transformer [1.0312968200748118]
We propose a novel one-stage mutual gaze detection framework called Mutual Gaze TRansformer or MGTR.
By designing mutual gaze instance triples, MGTR can detect each human head bounding box and simultaneously infer mutual gaze relationship based on global image information.
Experimental results on two mutual gaze datasets show that our method is able to accelerate mutual gaze detection process without losing performance.
arXiv Detail & Related papers (2022-09-22T11:26:22Z) - Active Gaze Control for Foveal Scene Exploration [124.11737060344052]
We propose a methodology to emulate how humans and robots with foveal cameras would explore a scene.
The proposed method achieves an increase in detection F1-score of 2-3 percentage points for the same number of gaze shifts.
arXiv Detail & Related papers (2022-08-24T14:59:28Z) - Glance and Gaze: Inferring Action-aware Points for One-Stage
Human-Object Interaction Detection [81.32280287658486]
We propose a novel one-stage method, namely Glance and Gaze Network (GGNet)
GGNet adaptively models a set of actionaware points (ActPoints) via glance and gaze steps.
We design an actionaware approach that effectively matches each detected interaction with its associated human-object pair.
arXiv Detail & Related papers (2021-04-12T08:01:04Z) - An Adversarial Human Pose Estimation Network Injected with Graph
Structure [75.08618278188209]
In this paper, we design a novel generative adversarial network (GAN) to improve the localization accuracy of visible joints when some joints are invisible.
The network consists of two simple but efficient modules, Cascade Feature Network (CFN) and Graph Structure Network (GSN)
arXiv Detail & Related papers (2021-03-29T12:07:08Z) - GID-Net: Detecting Human-Object Interaction with Global and Instance
Dependency [67.95192190179975]
We introduce a two-stage trainable reasoning mechanism, referred to as GID block.
GID-Net is a human-object interaction detection framework consisting of a human branch, an object branch and an interaction branch.
We have compared our proposed GID-Net with existing state-of-the-art methods on two public benchmarks, including V-COCO and HICO-DET.
arXiv Detail & Related papers (2020-03-11T11:58:43Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.