Benchmarking and Analyzing 3D Human Pose and Shape Estimation Beyond
Algorithms
- URL: http://arxiv.org/abs/2209.10529v1
- Date: Wed, 21 Sep 2022 17:39:53 GMT
- Title: Benchmarking and Analyzing 3D Human Pose and Shape Estimation Beyond
Algorithms
- Authors: Hui En Pang, Zhongang Cai, Lei Yang, Tianwei Zhang and Ziwei Liu
- Abstract summary: This work presents the first comprehensive benchmarking study from three under-explored perspectives beyond algorithms.
An analysis on 31 datasets reveals the distinct impacts of data samples.
We achieve a PA-MPJPE of 47.3 mm on the 3DPW test set with a relatively simple model.
- Score: 31.2529724533643
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: 3D human pose and shape estimation (a.k.a. "human mesh recovery") has
achieved substantial progress. Researchers mainly focus on the development of
novel algorithms, while less attention has been paid to other critical factors
involved. This could lead to less optimal baselines, hindering the fair and
faithful evaluations of newly designed methodologies. To address this problem,
this work presents the first comprehensive benchmarking study from three
under-explored perspectives beyond algorithms. 1) Datasets. An analysis on 31
datasets reveals the distinct impacts of data samples: datasets featuring
critical attributes (i.e. diverse poses, shapes, camera characteristics,
backbone features) are more effective. Strategical selection and combination of
high-quality datasets can yield a significant boost to the model performance.
2) Backbones. Experiments with 10 backbones, ranging from CNNs to transformers,
show the knowledge learnt from a proximity task is readily transferable to
human mesh recovery. 3) Training strategies. Proper augmentation techniques and
loss designs are crucial. With the above findings, we achieve a PA-MPJPE of
47.3 mm on the 3DPW test set with a relatively simple model. More importantly,
we provide strong baselines for fair comparisons of algorithms, and
recommendations for building effective training configurations in the future.
Codebase is available at http://github.com/smplbody/hmr-benchmarks
Related papers
- The Trifecta: Three simple techniques for training deeper
Forward-Forward networks [0.0]
We propose a collection of three techniques that synergize exceptionally well and drastically improve the Forward-Forward algorithm on deeper networks.
Our experiments demonstrate that our models are on par with similarly structured, backpropagation-based models in both training speed and test accuracy on simple datasets.
arXiv Detail & Related papers (2023-11-29T22:44:32Z) - Learning 3D Human Pose Estimation from Dozens of Datasets using a
Geometry-Aware Autoencoder to Bridge Between Skeleton Formats [80.12253291709673]
We propose a novel affine-combining autoencoder (ACAE) method to perform dimensionality reduction on the number of landmarks.
Our approach scales to an extreme multi-dataset regime, where we use 28 3D human pose datasets to supervise one model.
arXiv Detail & Related papers (2022-12-29T22:22:49Z) - Towards Robust Dataset Learning [90.2590325441068]
We propose a principled, tri-level optimization to formulate the robust dataset learning problem.
Under an abstraction model that characterizes robust vs. non-robust features, the proposed method provably learns a robust dataset.
arXiv Detail & Related papers (2022-11-19T17:06:10Z) - Recovering 3D Human Mesh from Monocular Images: A Survey [49.00136388529404]
Estimating human pose and shape from monocular images is a long-standing problem in computer vision.
This survey focuses on the task of monocular 3D human mesh recovery.
arXiv Detail & Related papers (2022-03-03T18:56:08Z) - Secrets of 3D Implicit Object Shape Reconstruction in the Wild [92.5554695397653]
Reconstructing high-fidelity 3D objects from sparse, partial observation is crucial for various applications in computer vision, robotics, and graphics.
Recent neural implicit modeling methods show promising results on synthetic or dense datasets.
But, they perform poorly on real-world data that is sparse and noisy.
This paper analyzes the root cause of such deficient performance of a popular neural implicit model.
arXiv Detail & Related papers (2021-01-18T03:24:48Z) - Deep Optimized Priors for 3D Shape Modeling and Reconstruction [38.79018852887249]
We introduce a new learning framework for 3D modeling and reconstruction.
We show that the proposed strategy effectively breaks the barriers constrained by the pre-trained priors.
arXiv Detail & Related papers (2020-12-14T03:56:31Z) - 2nd Place Scheme on Action Recognition Track of ECCV 2020 VIPriors
Challenges: An Efficient Optical Flow Stream Guided Framework [57.847010327319964]
We propose a data-efficient framework that can train the model from scratch on small datasets.
Specifically, by introducing a 3D central difference convolution operation, we proposed a novel C3D neural network-based two-stream framework.
It is proved that our method can achieve a promising result even without a pre-trained model on large scale datasets.
arXiv Detail & Related papers (2020-08-10T09:50:28Z) - Towards High Performance Human Keypoint Detection [87.1034745775229]
We find that context information plays an important role in reasoning human body configuration and invisible keypoints.
Inspired by this, we propose a cascaded context mixer ( CCM) which efficiently integrates spatial and channel context information.
To maximize CCM's representation capability, we develop a hard-negative person detection mining strategy and a joint-training strategy.
We present several sub-pixel refinement techniques for postprocessing keypoint predictions to improve detection accuracy.
arXiv Detail & Related papers (2020-02-03T02:24:51Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.