2D Image head pose estimation via latent space regression under
occlusion settings
- URL: http://arxiv.org/abs/2311.06038v1
- Date: Fri, 10 Nov 2023 12:53:02 GMT
- Title: 2D Image head pose estimation via latent space regression under
occlusion settings
- Authors: Jos\'e Celestino, Manuel Marques, Jacinto C. Nascimento and Jo\~ao
Paulo Costeira
- Abstract summary: The strategy is based on latent space regression as a fundamental key to better structure the problem for occluded scenarios.
We demonstrate the usefulness of the proposed approach with: (i) two synthetically occluded versions of the BIWI and AFLW2000 datasets, (ii) real-life occlusions of the Pandora dataset, and (iii) a real-life application to human-robot interaction scenarios.
- Score: 7.620379605206596
- License: http://creativecommons.org/licenses/by-nc-nd/4.0/
- Abstract: Head orientation is a challenging Computer Vision problem that has been
extensively researched having a wide variety of applications. However, current
state-of-the-art systems still underperform in the presence of occlusions and
are unreliable for many task applications in such scenarios. This work proposes
a novel deep learning approach for the problem of head pose estimation under
occlusions. The strategy is based on latent space regression as a fundamental
key to better structure the problem for occluded scenarios. Our model surpasses
several state-of-the-art methodologies for occluded HPE, and achieves similar
accuracy for non-occluded scenarios. We demonstrate the usefulness of the
proposed approach with: (i) two synthetically occluded versions of the BIWI and
AFLW2000 datasets, (ii) real-life occlusions of the Pandora dataset, and (iii)
a real-life application to human-robot interaction scenarios where face
occlusions often occur. Specifically, the autonomous feeding from a robotic
arm.
Related papers
- C$^{2}$INet: Realizing Incremental Trajectory Prediction with Prior-Aware Continual Causal Intervention [10.189508227447401]
Trajectory prediction for multi-agents in complex scenarios is crucial for applications like autonomous driving.
Existing methods often overlook environmental biases, which leads to poor generalization.
We propose the Continual Causal Intervention (C$2$INet) method for generalizable multi-agent trajectory prediction.
arXiv Detail & Related papers (2024-11-19T08:01:20Z) - Towards Cross-View-Consistent Self-Supervised Surround Depth Estimation [9.569646683579899]
Self-Supervised Surround Depth Estimation from consecutive images offers an economical alternative.
Previous SSSDE methods have proposed different mechanisms to fuse information across images, but few of them explicitly consider the cross-view constraints.
This paper proposes an efficient and consistent pose estimation design and two loss functions to enhance cross-view consistency for SSSDE.
arXiv Detail & Related papers (2024-07-04T16:29:05Z) - Viewpoint Generation using Feature-Based Constrained Spaces for Robot
Vision Systems [63.942632088208505]
This publication outlines the generation of viewpoints as a geometrical problem and introduces a generalized theoretical framework for solving it.
A $mathcalC$-space can be understood as the topological space that a viewpoint constraint spans, where the sensor can be positioned for acquiring a feature while fulfilling the regarded constraint.
The introduced $mathcalC$-spaces are characterized based on generic domain and viewpoint constraints models to ease the transferability of the present framework to different applications and robot vision systems.
arXiv Detail & Related papers (2023-06-12T08:57:15Z) - A Survey on Deep Learning-Based Monocular Spacecraft Pose Estimation:
Current State, Limitations and Prospects [7.08026800833095]
Estimating the pose of an uncooperative spacecraft is an important computer vision problem for enabling vision-based systems in orbit.
Following the general trend in computer vision, more and more works have been focusing on leveraging Deep Learning (DL) methods to address this problem.
Despite promising research-stage results, major challenges preventing the use of such methods in real-life missions still stand in the way.
arXiv Detail & Related papers (2023-05-12T09:52:53Z) - Progressive Multi-view Human Mesh Recovery with Self-Supervision [68.60019434498703]
Existing solutions typically suffer from poor generalization performance to new settings.
We propose a novel simulation-based training pipeline for multi-view human mesh recovery.
arXiv Detail & Related papers (2022-12-10T06:28:29Z) - Uncertainty-Aware Adaptation for Self-Supervised 3D Human Pose
Estimation [70.32536356351706]
We introduce MRP-Net that constitutes a common deep network backbone with two output heads subscribing to two diverse configurations.
We derive suitable measures to quantify prediction uncertainty at both pose and joint level.
We present a comprehensive evaluation of the proposed approach and demonstrate state-of-the-art performance on benchmark datasets.
arXiv Detail & Related papers (2022-03-29T07:14:58Z) - Occlusion-Robust Object Pose Estimation with Holistic Representation [42.27081423489484]
State-of-the-art (SOTA) object pose estimators take a two-stage approach.
We develop a novel occlude-and-blackout batch augmentation technique.
We also develop a multi-precision supervision architecture to encourage holistic pose representation learning.
arXiv Detail & Related papers (2021-10-22T08:00:26Z) - Deep Bingham Networks: Dealing with Uncertainty and Ambiguity in Pose
Estimation [74.76155168705975]
Deep Bingham Networks (DBN) can handle pose-related uncertainties and ambiguities arising in almost all real life applications concerning 3D data.
DBN extends the state of the art direct pose regression networks by (i) a multi-hypotheses prediction head which can yield different distribution modes.
We propose new training strategies so as to avoid mode or posterior collapse during training and to improve numerical stability.
arXiv Detail & Related papers (2020-12-20T19:20:26Z) - Multi-person 3D Pose Estimation in Crowded Scenes Based on Multi-View
Geometry [62.29762409558553]
Epipolar constraints are at the core of feature matching and depth estimation in multi-person 3D human pose estimation methods.
Despite the satisfactory performance of this formulation in sparser crowd scenes, its effectiveness is frequently challenged under denser crowd circumstances.
In this paper, we depart from the multi-person 3D pose estimation formulation, and instead reformulate it as crowd pose estimation.
arXiv Detail & Related papers (2020-07-21T17:59:36Z) - Unsupervised Domain Adaptation in Person re-ID via k-Reciprocal
Clustering and Large-Scale Heterogeneous Environment Synthesis [76.46004354572956]
We introduce an unsupervised domain adaptation approach for person re-identification.
Experimental results show that the proposed ktCUDA and SHRED approach achieves an average improvement of +5.7 mAP in re-identification performance.
arXiv Detail & Related papers (2020-01-14T17:43:52Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.