Progressive Multi-view Human Mesh Recovery with Self-Supervision
- URL: http://arxiv.org/abs/2212.05223v1
- Date: Sat, 10 Dec 2022 06:28:29 GMT
- Title: Progressive Multi-view Human Mesh Recovery with Self-Supervision
- Authors: Xuan Gong, Liangchen Song, Meng Zheng, Benjamin Planche, Terrence
Chen, Junsong Yuan, David Doermann, Ziyan Wu
- Abstract summary: Existing solutions typically suffer from poor generalization performance to new settings.
We propose a novel simulation-based training pipeline for multi-view human mesh recovery.
- Score: 68.60019434498703
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: To date, little attention has been given to multi-view 3D human mesh
estimation, despite real-life applicability (e.g., motion capture, sport
analysis) and robustness to single-view ambiguities. Existing solutions
typically suffer from poor generalization performance to new settings, largely
due to the limited diversity of image-mesh pairs in multi-view training data.
To address this shortcoming, people have explored the use of synthetic images.
But besides the usual impact of visual gap between rendered and target data,
synthetic-data-driven multi-view estimators also suffer from overfitting to the
camera viewpoint distribution sampled during training which usually differs
from real-world distributions. Tackling both challenges, we propose a novel
simulation-based training pipeline for multi-view human mesh recovery, which
(a) relies on intermediate 2D representations which are more robust to
synthetic-to-real domain gap; (b) leverages learnable calibration and
triangulation to adapt to more diversified camera setups; and (c) progressively
aggregates multi-view information in a canonical 3D space to remove ambiguities
in 2D representations. Through extensive benchmarking, we demonstrate the
superiority of the proposed solution especially for unseen in-the-wild
scenarios.
Related papers
- DiHuR: Diffusion-Guided Generalizable Human Reconstruction [51.31232435994026]
We introduce DiHuR, a Diffusion-guided model for generalizable Human 3D Reconstruction and view synthesis from sparse, minimally overlapping images.
Our method integrates two key priors in a coherent manner: the prior from generalizable feed-forward models and the 2D diffusion prior, and it requires only multi-view image training, without 3D supervision.
arXiv Detail & Related papers (2024-11-16T03:52:23Z) - MagicMan: Generative Novel View Synthesis of Humans with 3D-Aware Diffusion and Iterative Refinement [23.707586182294932]
Existing works in single-image human reconstruction suffer from weak generalizability due to insufficient training data or 3D inconsistencies for a lack of comprehensive multi-view knowledge.
We introduce MagicMan, a human-specific multi-view diffusion model designed to generate high-quality novel view images from a single reference image.
arXiv Detail & Related papers (2024-08-26T12:10:52Z) - Deceptive-NeRF/3DGS: Diffusion-Generated Pseudo-Observations for High-Quality Sparse-View Reconstruction [60.52716381465063]
We introduce Deceptive-NeRF/3DGS to enhance sparse-view reconstruction with only a limited set of input images.
Specifically, we propose a deceptive diffusion model turning noisy images rendered from few-view reconstructions into high-quality pseudo-observations.
Our system progressively incorporates diffusion-generated pseudo-observations into the training image sets, ultimately densifying the sparse input observations by 5 to 10 times.
arXiv Detail & Related papers (2023-05-24T14:00:32Z) - Self-supervised Human Mesh Recovery with Cross-Representation Alignment [20.69546341109787]
Self-supervised human mesh recovery methods have poor generalizability due to limited availability and diversity of 3D-annotated benchmark datasets.
We propose cross-representation alignment utilizing the complementary information from the robust but sparse representation (2D keypoints)
This adaptive cross-representation alignment explicitly learns from the deviations and captures complementary information: richness from sparse representation and robustness from dense representation.
arXiv Detail & Related papers (2022-09-10T04:47:20Z) - Uncertainty-Aware Adaptation for Self-Supervised 3D Human Pose
Estimation [70.32536356351706]
We introduce MRP-Net that constitutes a common deep network backbone with two output heads subscribing to two diverse configurations.
We derive suitable measures to quantify prediction uncertainty at both pose and joint level.
We present a comprehensive evaluation of the proposed approach and demonstrate state-of-the-art performance on benchmark datasets.
arXiv Detail & Related papers (2022-03-29T07:14:58Z) - Kinematic-Structure-Preserved Representation for Unsupervised 3D Human
Pose Estimation [58.72192168935338]
Generalizability of human pose estimation models developed using supervision on large-scale in-studio datasets remains questionable.
We propose a novel kinematic-structure-preserved unsupervised 3D pose estimation framework, which is not restrained by any paired or unpaired weak supervisions.
Our proposed model employs three consecutive differentiable transformations named as forward-kinematics, camera-projection and spatial-map transformation.
arXiv Detail & Related papers (2020-06-24T23:56:33Z) - Weakly-Supervised 3D Human Pose Learning via Multi-view Images in the
Wild [101.70320427145388]
We propose a weakly-supervised approach that does not require 3D annotations and learns to estimate 3D poses from unlabeled multi-view data.
We evaluate our proposed approach on two large scale datasets.
arXiv Detail & Related papers (2020-03-17T08:47:16Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.