Occluded Human Mesh Recovery
- URL: http://arxiv.org/abs/2203.13349v1
- Date: Thu, 24 Mar 2022 21:39:20 GMT
- Title: Occluded Human Mesh Recovery
- Authors: Rawal Khirodkar, Shashank Tripathi, Kris Kitani
- Abstract summary: We present Occluded Human Mesh Recovery (OCHMR) - a novel top-down mesh recovery approach that incorporates image spatial context.
OCHMR achieves superior performance on challenging multi-person benchmarks like 3DPW, CrowdPose and OCHuman.
- Score: 23.63235079216075
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Top-down methods for monocular human mesh recovery have two stages: (1)
detect human bounding boxes; (2) treat each bounding box as an independent
single-human mesh recovery task. Unfortunately, the single-human assumption
does not hold in images with multi-human occlusion and crowding. Consequently,
top-down methods have difficulties in recovering accurate 3D human meshes under
severe person-person occlusion. To address this, we present Occluded Human Mesh
Recovery (OCHMR) - a novel top-down mesh recovery approach that incorporates
image spatial context to overcome the limitations of the single-human
assumption. The approach is conceptually simple and can be applied to any
existing top-down architecture. Along with the input image, we condition the
top-down model on spatial context from the image in the form of body-center
heatmaps. To reason from the predicted body centermaps, we introduce Contextual
Normalization (CoNorm) blocks to adaptively modulate intermediate features of
the top-down model. The contextual conditioning helps our model disambiguate
between two severely overlapping human bounding-boxes, making it robust to
multi-person occlusion. Compared with state-of-the-art methods, OCHMR achieves
superior performance on challenging multi-person benchmarks like 3DPW,
CrowdPose and OCHuman. Specifically, our proposed contextual reasoning
architecture applied to the SPIN model with ResNet-50 backbone results in 75.2
PMPJPE on 3DPW-PC, 23.6 AP on CrowdPose and 37.7 AP on OCHuman datasets, a
significant improvement of 6.9 mm, 6.4 AP and 20.8 AP respectively over the
baseline. Code and models will be released.
Related papers
- ARTS: Semi-Analytical Regressor using Disentangled Skeletal Representations for Human Mesh Recovery from Videos [18.685856290041283]
ARTS surpasses existing state-of-the-art video-based methods in both per-frame accuracy and temporal consistency on popular benchmarks.
A skeleton estimation and disentanglement module is proposed to estimate the 3D skeletons from a video.
The regressor consists of three modules: Temporal Inverse Kinematics (TIK), Bone-guided Shape Fitting (BSF), and Motion-Centric Refinement (MCR)
arXiv Detail & Related papers (2024-10-21T02:06:43Z) - AiOS: All-in-One-Stage Expressive Human Pose and Shape Estimation [55.179287851188036]
We introduce a novel all-in-one-stage framework, AiOS, for expressive human pose and shape recovery without an additional human detection step.
We first employ a human token to probe a human location in the image and encode global features for each instance.
Then, we introduce a joint-related token to probe the human joint in the image and encoder a fine-grained local feature.
arXiv Detail & Related papers (2024-03-26T17:59:23Z) - Score-Guided Diffusion for 3D Human Recovery [10.562998991986102]
We present Score-Guided Human Mesh Recovery (ScoreHMR), an approach for solving inverse problems for 3D human pose and shape reconstruction.
ScoreHMR mimics model fitting approaches, but alignment with the image observation is achieved through score guidance in the latent space of a diffusion model.
We evaluate our approach on three settings/applications: (i) single-frame model fitting; (ii) reconstruction from multiple uncalibrated views; (iii) reconstructing humans in video sequences.
arXiv Detail & Related papers (2024-03-14T17:56:14Z) - ORTexME: Occlusion-Robust Human Shape and Pose via Temporal Average
Texture and Mesh Encoding [35.49066795648395]
In 3D human shape and pose estimation from a monocular video, models trained with limited labeled data cannot generalize well to videos with occlusion.
We introduce ORTexME, an occlusion-robust temporal method that utilizes temporal information from the input video to better regularize the occluded body parts.
Our method achieves significant improvement on the challenging multi-person 3DPW dataset, where our method achieves 1.8 P-MPJPE error reduction.
arXiv Detail & Related papers (2023-09-21T15:50:04Z) - Bottom-Up 2D Pose Estimation via Dual Anatomical Centers for Small-Scale
Persons [75.86463396561744]
In multi-person 2D pose estimation, the bottom-up methods simultaneously predict poses for all persons.
Our method achieves 38.4% improvement on bounding box precision and 39.1% improvement on bounding box recall over the state of the art (SOTA)
For the human pose AP evaluation, we achieve a new SOTA (71.0 AP) on the COCO test-dev set with the single-scale testing.
arXiv Detail & Related papers (2022-08-25T10:09:10Z) - 3D Multi-bodies: Fitting Sets of Plausible 3D Human Models to Ambiguous
Image Data [77.57798334776353]
We consider the problem of obtaining dense 3D reconstructions of humans from single and partially occluded views.
We suggest that ambiguities can be modelled more effectively by parametrizing the possible body shapes and poses.
We show that our method outperforms alternative approaches in ambiguous pose recovery on standard benchmarks for 3D humans.
arXiv Detail & Related papers (2020-11-02T13:55:31Z) - Multi-person 3D Pose Estimation in Crowded Scenes Based on Multi-View
Geometry [62.29762409558553]
Epipolar constraints are at the core of feature matching and depth estimation in multi-person 3D human pose estimation methods.
Despite the satisfactory performance of this formulation in sparser crowd scenes, its effectiveness is frequently challenged under denser crowd circumstances.
In this paper, we depart from the multi-person 3D pose estimation formulation, and instead reformulate it as crowd pose estimation.
arXiv Detail & Related papers (2020-07-21T17:59:36Z) - Coherent Reconstruction of Multiple Humans from a Single Image [68.3319089392548]
In this work, we address the problem of multi-person 3D pose estimation from a single image.
A typical regression approach in the top-down setting of this problem would first detect all humans and then reconstruct each one of them independently.
Our goal is to train a single network that learns to avoid these problems and generate a coherent 3D reconstruction of all the humans in the scene.
arXiv Detail & Related papers (2020-06-15T17:51:45Z) - HEMlets PoSh: Learning Part-Centric Heatmap Triplets for 3D Human Pose
and Shape Estimation [60.35776484235304]
This work attempts to address the uncertainty of lifting the detected 2D joints to the 3D space by introducing an intermediate state-Part-Centric Heatmap Triplets (HEMlets)
The HEMlets utilize three joint-heatmaps to represent the relative depth information of the end-joints for each skeletal body part.
A Convolutional Network (ConvNet) is first trained to predict HEMlets from the input image, followed by a volumetric joint-heatmap regression.
arXiv Detail & Related papers (2020-03-10T04:03:45Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.