Object-aware Gaze Target Detection
- URL: http://arxiv.org/abs/2307.09662v2
- Date: Wed, 27 Sep 2023 13:08:51 GMT
- Title: Object-aware Gaze Target Detection
- Authors: Francesco Tonini and Nicola Dall'Asen and Cigdem Beyan and Elisa Ricci
- Abstract summary: This paper proposes a Transformer-based architecture that automatically detects objects in the scene to build associations between every head and the gazed-head/object.
Our method achieves state-of-the-art results on all metrics for gaze target detection and 11-13% improvement in average precision for the classification and the localization of the gazed-objects.
- Score: 14.587595325977583
- License: http://creativecommons.org/licenses/by-nc-nd/4.0/
- Abstract: Gaze target detection aims to predict the image location where the person is
looking and the probability that a gaze is out of the scene. Several works have
tackled this task by regressing a gaze heatmap centered on the gaze location,
however, they overlooked decoding the relationship between the people and the
gazed objects. This paper proposes a Transformer-based architecture that
automatically detects objects (including heads) in the scene to build
associations between every head and the gazed-head/object, resulting in a
comprehensive, explainable gaze analysis composed of: gaze target area, gaze
pixel point, the class and the image location of the gazed-object. Upon
evaluation of the in-the-wild benchmarks, our method achieves state-of-the-art
results on all metrics (up to 2.91% gain in AUC, 50% reduction in gaze
distance, and 9% gain in out-of-frame average precision) for gaze target
detection and 11-13% improvement in average precision for the classification
and the localization of the gazed-objects. The code of the proposed method is
publicly available.
Related papers
- Boosting Gaze Object Prediction via Pixel-level Supervision from Vision Foundation Model [19.800353299691277]
This paper presents a more challenging gaze object segmentation (GOS) task, which involves inferring the pixel-level mask corresponding to the object captured by human gaze behavior.
We propose to automatically obtain head features from scene features to ensure the model's inference efficiency and flexibility in the real world.
arXiv Detail & Related papers (2024-08-02T06:32:45Z) - Semi-Synthetic Dataset Augmentation for Application-Specific Gaze
Estimation [0.3683202928838613]
We show how to generate a tridimensional mesh of the face and render the training images from a virtual camera at a specific position and orientation related to the application.
This leads to an average 47% decrease in gaze estimation angular error.
arXiv Detail & Related papers (2023-10-27T20:27:22Z) - LatentGaze: Cross-Domain Gaze Estimation through Gaze-Aware Analytic
Latent Code Manipulation [0.0]
We propose a gaze-aware analytic manipulation method, based on a data-driven approach with generative adversarial network inversion's disentanglement characteristics.
By utilizing GAN-based encoder-generator process, we shift the input image from the target domain to the source domain image, which a gaze estimator is sufficiently aware.
arXiv Detail & Related papers (2022-09-21T08:05:53Z) - Active Gaze Control for Foveal Scene Exploration [124.11737060344052]
We propose a methodology to emulate how humans and robots with foveal cameras would explore a scene.
The proposed method achieves an increase in detection F1-score of 2-3 percentage points for the same number of gaze shifts.
arXiv Detail & Related papers (2022-08-24T14:59:28Z) - End-to-End Human-Gaze-Target Detection with Transformers [57.00864538284686]
We propose an effective and efficient method for Human-Gaze-Target (HGT) detection, i.e., gaze following.
Our method, named Human-Gaze-Target detection TRansformer or HGTTR, streamlines the HGT detection pipeline by eliminating all other components.
The effectiveness and robustness of our proposed method are verified with extensive experiments on the two standard benchmark datasets, GazeFollowing and VideoAttentionTarget.
arXiv Detail & Related papers (2022-03-20T02:37:06Z) - Object Manipulation via Visual Target Localization [64.05939029132394]
Training agents to manipulate objects, poses many challenges.
We propose an approach that explores the environment in search for target objects, computes their 3D coordinates once they are located, and then continues to estimate their 3D locations even when the objects are not visible.
Our evaluations show a massive 3x improvement in success rate over a model that has access to the same sensory suite.
arXiv Detail & Related papers (2022-03-15T17:59:01Z) - Slender Object Detection: Diagnoses and Improvements [74.40792217534]
In this paper, we are concerned with the detection of a particular type of objects with extreme aspect ratios, namely textbfslender objects.
For a classical object detection method, a drastic drop of $18.9%$ mAP on COCO is observed, if solely evaluated on slender objects.
arXiv Detail & Related papers (2020-11-17T09:39:42Z) - Self-supervised Segmentation via Background Inpainting [96.10971980098196]
We introduce a self-supervised detection and segmentation approach that can work with single images captured by a potentially moving camera.
We exploit a self-supervised loss function that we exploit to train a proposal-based segmentation network.
We apply our method to human detection and segmentation in images that visually depart from those of standard benchmarks and outperform existing self-supervised methods.
arXiv Detail & Related papers (2020-11-11T08:34:40Z) - A Self-Training Approach for Point-Supervised Object Detection and
Counting in Crowds [54.73161039445703]
We propose a novel self-training approach that enables a typical object detector trained only with point-level annotations.
During training, we utilize the available point annotations to supervise the estimation of the center points of objects.
Experimental results show that our approach significantly outperforms state-of-the-art point-supervised methods under both detection and counting tasks.
arXiv Detail & Related papers (2020-07-25T02:14:42Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.