Upper-Body Pose-based Gaze Estimation for Privacy-Preserving 3D Gaze Target Detection
- URL: http://arxiv.org/abs/2409.17886v1
- Date: Thu, 26 Sep 2024 14:35:06 GMT
- Title: Upper-Body Pose-based Gaze Estimation for Privacy-Preserving 3D Gaze Target Detection
- Authors: Andrea Toaiari, Vittorio Murino, Marco Cristani, Cigdem Beyan,
- Abstract summary: Existing approaches heavily rely on analyzing the person's appearance, primarily focusing on their face to predict the gaze target.
This paper presents a novel approach by utilizing the person's upper-body pose and available depth maps to extract a 3D gaze direction.
We demonstrate state-of-the-art results on the most comprehensive publicly accessible 3D gaze target detection dataset.
- Score: 19.478147736434394
- License: http://creativecommons.org/licenses/by-nc-nd/4.0/
- Abstract: Gaze Target Detection (GTD), i.e., determining where a person is looking within a scene from an external viewpoint, is a challenging task, particularly in 3D space. Existing approaches heavily rely on analyzing the person's appearance, primarily focusing on their face to predict the gaze target. This paper presents a novel approach to tackle this problem by utilizing the person's upper-body pose and available depth maps to extract a 3D gaze direction and employing a multi-stage or an end-to-end pipeline to predict the gazed target. When predicted accurately, the human body pose can provide valuable information about the head pose, which is a good approximation of the gaze direction, as well as the position of the arms and hands, which are linked to the activity the person is performing and the objects they are likely focusing on. Consequently, in addition to performing gaze estimation in 3D, we are also able to perform GTD simultaneously. We demonstrate state-of-the-art results on the most comprehensive publicly accessible 3D gaze target detection dataset without requiring images of the person's face, thus promoting privacy preservation in various application contexts. The code is available at https://github.com/intelligolabs/privacy-gtd-3D.
Related papers
- Leveraging Multi-Modal Saliency and Fusion for Gaze Target Detection [0.0]
We propose a novel method for GTD that fuses multiple pieces of information extracted from an image.
First, we project the 2D image into a 3D representation using monocular depth estimation.
We also extract face and depth modalities from the image, and finally fuse all the extracted modalities to identify the gaze target.
arXiv Detail & Related papers (2025-04-27T14:59:13Z) - Intent3D: 3D Object Detection in RGB-D Scans Based on Human Intention [86.39271731460927]
3D intention grounding is a new task in 3D object detection employing RGB-D, based on human intention, such as "I want something to support my back"
We introduce the new Intent3D dataset, consisting of 44,990 intention texts associated with 209 fine-grained classes from 1,042 scenes of the ScanNet dataset.
We also propose IntentNet, our unique approach, designed to tackle this intention-based detection problem.
arXiv Detail & Related papers (2024-05-28T15:48:39Z) - Social-Transmotion: Promptable Human Trajectory Prediction [65.80068316170613]
Social-Transmotion is a generic Transformer-based model that exploits diverse and numerous visual cues to predict human behavior.
Our approach is validated on multiple datasets, including JTA, JRDB, Pedestrians and Cyclists in Road Traffic, and ETH-UCY.
arXiv Detail & Related papers (2023-12-26T18:56:49Z) - A Modular Multimodal Architecture for Gaze Target Prediction:
Application to Privacy-Sensitive Settings [18.885623017619988]
We propose a modular multimodal architecture allowing to combine multimodal cues using an attention mechanism.
The architecture can naturally be exploited in privacy-sensitive situations such as surveillance and health, where personally identifiable information cannot be released.
arXiv Detail & Related papers (2023-07-11T10:30:33Z) - 3DGazeNet: Generalizing Gaze Estimation with Weak-Supervision from
Synthetic Views [67.00931529296788]
We propose to train general gaze estimation models which can be directly employed in novel environments without adaptation.
We create a large-scale dataset of diverse faces with gaze pseudo-annotations, which we extract based on the 3D geometry of the scene.
We test our method in the task of gaze generalization, in which we demonstrate improvement of up to 30% compared to state-of-the-art when no ground truth data are available.
arXiv Detail & Related papers (2022-12-06T14:15:17Z) - Unsupervised 3D Keypoint Discovery with Multi-View Geometry [104.76006413355485]
We propose an algorithm that learns to discover 3D keypoints on human bodies from multiple-view images without supervision or labels.
Our approach discovers more interpretable and accurate 3D keypoints compared to other state-of-the-art unsupervised approaches.
arXiv Detail & Related papers (2022-11-23T10:25:12Z) - PedRecNet: Multi-task deep neural network for full 3D human pose and
orientation estimation [0.0]
multitask network supports various deep neural network based pedestrian detection functions.
Network architecture is relatively simple, yet powerful, and easily adaptable for further research and applications.
arXiv Detail & Related papers (2022-04-25T10:47:01Z) - PONet: Robust 3D Human Pose Estimation via Learning Orientations Only [116.1502793612437]
We propose a novel Pose Orientation Net (PONet) that is able to robustly estimate 3D pose by learning orientations only.
PONet estimates the 3D orientation of these limbs by taking advantage of the local image evidence to recover the 3D pose.
We evaluate our method on multiple datasets, including Human3.6M, MPII, MPI-INF-3DHP, and 3DPW.
arXiv Detail & Related papers (2021-12-21T12:48:48Z) - 3D Object Detection for Autonomous Driving: A Survey [14.772968858398043]
3D object detection serves as the core basis of such perception system.
Despite existing efforts, 3D object detection on point clouds is still in its infancy.
Recent state-of-the-art detection methods with their pros and cons are presented.
arXiv Detail & Related papers (2021-06-21T03:17:20Z) - Perceiving Humans: from Monocular 3D Localization to Social Distancing [93.03056743850141]
We present a new cost-effective vision-based method that perceives humans' locations in 3D and their body orientation from a single image.
We show that it is possible to rethink the concept of "social distancing" as a form of social interaction in contrast to a simple location-based rule.
arXiv Detail & Related papers (2020-09-01T10:12:30Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.