Occlusion Resilient 3D Human Pose Estimation
- URL: http://arxiv.org/abs/2402.11036v1
- Date: Fri, 16 Feb 2024 19:29:43 GMT
- Title: Occlusion Resilient 3D Human Pose Estimation
- Authors: Soumava Kumar Roy, Ilia Badanin, Sina Honari and Pascal Fua
- Abstract summary: Occlusions remain one of the key challenges in 3D body pose estimation from single-camera video sequences.
We demonstrate the effectiveness of this approach compared to state-of-the-art techniques that infer poses from single-camera sequences.
- Score: 52.49366182230432
- License: http://creativecommons.org/publicdomain/zero/1.0/
- Abstract: Occlusions remain one of the key challenges in 3D body pose estimation from
single-camera video sequences. Temporal consistency has been extensively used
to mitigate their impact but the existing algorithms in the literature do not
explicitly model them.
Here, we apply this by representing the deforming body as a spatio-temporal
graph. We then introduce a refinement network that performs graph convolutions
over this graph to output 3D poses. To ensure robustness to occlusions, we
train this network with a set of binary masks that we use to disable some of
the edges as in drop-out techniques.
In effect, we simulate the fact that some joints can be hidden for periods of
time and train the network to be immune to that. We demonstrate the
effectiveness of this approach compared to state-of-the-art techniques that
infer poses from single-camera sequences.
Related papers
- Occlusion Robust 3D Human Pose Estimation with StridedPoseGraphFormer
and Data Augmentation [69.49430149980789]
We show that our proposed method compares favorably with the state-of-the-art (SoA)
Our experimental results also reveal that in the absence of any occlusion handling mechanism, the performance of SoA 3D HPE methods degrades significantly when they encounter occlusion.
arXiv Detail & Related papers (2023-04-24T13:05:13Z) - On Triangulation as a Form of Self-Supervision for 3D Human Pose
Estimation [57.766049538913926]
Supervised approaches to 3D pose estimation from single images are remarkably effective when labeled data is abundant.
Much of the recent attention has shifted towards semi and (or) weakly supervised learning.
We propose to impose multi-view geometrical constraints by means of a differentiable triangulation and to use it as form of self-supervision during training when no labels are available.
arXiv Detail & Related papers (2022-03-29T19:11:54Z) - NeuralReshaper: Single-image Human-body Retouching with Deep Neural
Networks [50.40798258968408]
We present NeuralReshaper, a novel method for semantic reshaping of human bodies in single images using deep generative networks.
Our approach follows a fit-then-reshape pipeline, which first fits a parametric 3D human model to a source human image.
To deal with the lack-of-data problem that no paired data exist, we introduce a novel self-supervised strategy to train our network.
arXiv Detail & Related papers (2022-03-20T09:02:13Z) - Generating Band-Limited Adversarial Surfaces Using Neural Networks [0.9208007322096533]
adversarial examples is the art of creating a noise that is added to an input signal of a classifying neural network.
In this technical report we suggest a neural network that generates the attacks.
arXiv Detail & Related papers (2021-11-14T19:16:05Z) - Scene Synthesis via Uncertainty-Driven Attribute Synchronization [52.31834816911887]
This paper introduces a novel neural scene synthesis approach that can capture diverse feature patterns of 3D scenes.
Our method combines the strength of both neural network-based and conventional scene synthesis approaches.
arXiv Detail & Related papers (2021-08-30T19:45:07Z) - 3D Pose Detection in Videos: Focusing on Occlusion [0.4588028371034406]
We build upon existing methods for occlusion-aware 3D pose detection in videos.
We implement a two stage architecture that consists of the stacked hourglass network to produce 2D pose predictions.
To facilitate prediction on poses with occluded joints, we introduce an intuitive generalization of the cylinder man model.
arXiv Detail & Related papers (2020-06-24T07:01:17Z) - Coherent Reconstruction of Multiple Humans from a Single Image [68.3319089392548]
In this work, we address the problem of multi-person 3D pose estimation from a single image.
A typical regression approach in the top-down setting of this problem would first detect all humans and then reconstruct each one of them independently.
Our goal is to train a single network that learns to avoid these problems and generate a coherent 3D reconstruction of all the humans in the scene.
arXiv Detail & Related papers (2020-06-15T17:51:45Z) - 3D Human Pose Estimation using Spatio-Temporal Networks with Explicit
Occlusion Training [40.933783830017035]
Estimating 3D poses from a monocular task is still a challenging task, despite the significant progress that has been made in recent years.
We introduce a-temporal video network for robust 3D human pose estimation.
We apply multi-scale spatial features for 2D joints or keypoints prediction in each individual frame, and multistride temporal convolutional net-works (TCNs) to estimate 3D joints or keypoints.
arXiv Detail & Related papers (2020-04-07T09:12:12Z) - A Graph Attention Spatio-temporal Convolutional Network for 3D Human
Pose Estimation in Video [7.647599484103065]
We improve the learning of constraints in human skeleton by modeling local global spatial information via attention mechanisms.
Our approach effectively mitigates depth ambiguity and self-occlusion, generalizes to half upper body estimation, and achieves competitive performance on 2D-to-3D video pose estimation.
arXiv Detail & Related papers (2020-03-11T14:54:40Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.