Progressive Multi-view Human Mesh Recovery with Self-Supervision
- URL: http://arxiv.org/abs/2212.05223v1
- Date: Sat, 10 Dec 2022 06:28:29 GMT
- Title: Progressive Multi-view Human Mesh Recovery with Self-Supervision
- Authors: Xuan Gong, Liangchen Song, Meng Zheng, Benjamin Planche, Terrence
Chen, Junsong Yuan, David Doermann, Ziyan Wu
- Abstract summary: Existing solutions typically suffer from poor generalization performance to new settings.
We propose a novel simulation-based training pipeline for multi-view human mesh recovery.
- Score: 68.60019434498703
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: To date, little attention has been given to multi-view 3D human mesh
estimation, despite real-life applicability (e.g., motion capture, sport
analysis) and robustness to single-view ambiguities. Existing solutions
typically suffer from poor generalization performance to new settings, largely
due to the limited diversity of image-mesh pairs in multi-view training data.
To address this shortcoming, people have explored the use of synthetic images.
But besides the usual impact of visual gap between rendered and target data,
synthetic-data-driven multi-view estimators also suffer from overfitting to the
camera viewpoint distribution sampled during training which usually differs
from real-world distributions. Tackling both challenges, we propose a novel
simulation-based training pipeline for multi-view human mesh recovery, which
(a) relies on intermediate 2D representations which are more robust to
synthetic-to-real domain gap; (b) leverages learnable calibration and
triangulation to adapt to more diversified camera setups; and (c) progressively
aggregates multi-view information in a canonical 3D space to remove ambiguities
in 2D representations. Through extensive benchmarking, we demonstrate the
superiority of the proposed solution especially for unseen in-the-wild
scenarios.
Related papers
- Geometry-Biased Transformer for Robust Multi-View 3D Human Pose
Reconstruction [3.069335774032178]
We propose a novel encoder-decoder Transformer architecture to estimate 3D poses from multi-view 2D pose sequences.
We conduct experiments on three benchmark public datasets, Human3.6M, CMU Panoptic and Occlusion-Persons.
arXiv Detail & Related papers (2023-12-28T16:30:05Z) - Deceptive-NeRF/3DGS: Diffusion-Generated Pseudo-Observations for High-Quality Sparse-View Reconstruction [60.52716381465063]
We introduce Deceptive-NeRF/3DGS to enhance sparse-view reconstruction with only a limited set of input images.
Specifically, we propose a deceptive diffusion model turning noisy images rendered from few-view reconstructions into high-quality pseudo-observations.
Our system progressively incorporates diffusion-generated pseudo-observations into the training image sets, ultimately densifying the sparse input observations by 5 to 10 times.
arXiv Detail & Related papers (2023-05-24T14:00:32Z) - Self-supervised Human Mesh Recovery with Cross-Representation Alignment [20.69546341109787]
Self-supervised human mesh recovery methods have poor generalizability due to limited availability and diversity of 3D-annotated benchmark datasets.
We propose cross-representation alignment utilizing the complementary information from the robust but sparse representation (2D keypoints)
This adaptive cross-representation alignment explicitly learns from the deviations and captures complementary information: richness from sparse representation and robustness from dense representation.
arXiv Detail & Related papers (2022-09-10T04:47:20Z) - Uncertainty-Aware Adaptation for Self-Supervised 3D Human Pose
Estimation [70.32536356351706]
We introduce MRP-Net that constitutes a common deep network backbone with two output heads subscribing to two diverse configurations.
We derive suitable measures to quantify prediction uncertainty at both pose and joint level.
We present a comprehensive evaluation of the proposed approach and demonstrate state-of-the-art performance on benchmark datasets.
arXiv Detail & Related papers (2022-03-29T07:14:58Z) - Multi-person Implicit Reconstruction from a Single Image [37.6877421030774]
We present a new end-to-end learning framework to obtain detailed and spatially coherent reconstructions of multiple people from a single image.
Existing multi-person methods suffer from two main drawbacks: they are often model-based and cannot capture accurate 3D models of people with loose clothing and hair.
arXiv Detail & Related papers (2021-04-19T13:21:55Z) - Self-supervised Human Detection and Segmentation via Multi-view
Consensus [116.92405645348185]
We propose a multi-camera framework in which geometric constraints are embedded in the form of multi-view consistency during training.
We show that our approach outperforms state-of-the-art self-supervised person detection and segmentation techniques on images that visually depart from those of standard benchmarks.
arXiv Detail & Related papers (2020-12-09T15:47:21Z) - Kinematic-Structure-Preserved Representation for Unsupervised 3D Human
Pose Estimation [58.72192168935338]
Generalizability of human pose estimation models developed using supervision on large-scale in-studio datasets remains questionable.
We propose a novel kinematic-structure-preserved unsupervised 3D pose estimation framework, which is not restrained by any paired or unpaired weak supervisions.
Our proposed model employs three consecutive differentiable transformations named as forward-kinematics, camera-projection and spatial-map transformation.
arXiv Detail & Related papers (2020-06-24T23:56:33Z) - Weakly-Supervised 3D Human Pose Learning via Multi-view Images in the
Wild [101.70320427145388]
We propose a weakly-supervised approach that does not require 3D annotations and learns to estimate 3D poses from unlabeled multi-view data.
We evaluate our proposed approach on two large scale datasets.
arXiv Detail & Related papers (2020-03-17T08:47:16Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.