Self-supervised Human Mesh Recovery with Cross-Representation Alignment
- URL: http://arxiv.org/abs/2209.04596v1
- Date: Sat, 10 Sep 2022 04:47:20 GMT
- Title: Self-supervised Human Mesh Recovery with Cross-Representation Alignment
- Authors: Xuan Gong, Meng Zheng, Benjamin Planche, Srikrishna Karanam, Terrence
Chen, David Doermann, and Ziyan Wu
- Abstract summary: Self-supervised human mesh recovery methods have poor generalizability due to limited availability and diversity of 3D-annotated benchmark datasets.
We propose cross-representation alignment utilizing the complementary information from the robust but sparse representation (2D keypoints)
This adaptive cross-representation alignment explicitly learns from the deviations and captures complementary information: richness from sparse representation and robustness from dense representation.
- Score: 20.69546341109787
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Fully supervised human mesh recovery methods are data-hungry and have poor
generalizability due to the limited availability and diversity of 3D-annotated
benchmark datasets. Recent progress in self-supervised human mesh recovery has
been made using synthetic-data-driven training paradigms where the model is
trained from synthetic paired 2D representation (e.g., 2D keypoints and
segmentation masks) and 3D mesh. However, on synthetic dense correspondence
maps (i.e., IUV) few have been explored since the domain gap between synthetic
training data and real testing data is hard to address for 2D dense
representation. To alleviate this domain gap on IUV, we propose
cross-representation alignment utilizing the complementary information from the
robust but sparse representation (2D keypoints). Specifically, the alignment
errors between initial mesh estimation and both 2D representations are
forwarded into regressor and dynamically corrected in the following mesh
regression. This adaptive cross-representation alignment explicitly learns from
the deviations and captures complementary information: robustness from sparse
representation and richness from dense representation. We conduct extensive
experiments on multiple standard benchmark datasets and demonstrate competitive
results, helping take a step towards reducing the annotation effort needed to
produce state-of-the-art models in human mesh estimation.
Related papers
- DiHuR: Diffusion-Guided Generalizable Human Reconstruction [51.31232435994026]
We introduce DiHuR, a Diffusion-guided model for generalizable Human 3D Reconstruction and view synthesis from sparse, minimally overlapping images.
Our method integrates two key priors in a coherent manner: the prior from generalizable feed-forward models and the 2D diffusion prior, and it requires only multi-view image training, without 3D supervision.
arXiv Detail & Related papers (2024-11-16T03:52:23Z) - Enhancing Generalizability of Representation Learning for Data-Efficient 3D Scene Understanding [50.448520056844885]
We propose a generative Bayesian network to produce diverse synthetic scenes with real-world patterns.
A series of experiments robustly display our method's consistent superiority over existing state-of-the-art pre-training approaches.
arXiv Detail & Related papers (2024-06-17T07:43:53Z) - 3D Adversarial Augmentations for Robust Out-of-Domain Predictions [115.74319739738571]
We focus on improving the generalization to out-of-domain data.
We learn a set of vectors that deform the objects in an adversarial fashion.
We perform adversarial augmentation by applying the learned sample-independent vectors to the available objects when training a model.
arXiv Detail & Related papers (2023-08-29T17:58:55Z) - Progressive Multi-view Human Mesh Recovery with Self-Supervision [68.60019434498703]
Existing solutions typically suffer from poor generalization performance to new settings.
We propose a novel simulation-based training pipeline for multi-view human mesh recovery.
arXiv Detail & Related papers (2022-12-10T06:28:29Z) - Uncertainty-Aware Adaptation for Self-Supervised 3D Human Pose
Estimation [70.32536356351706]
We introduce MRP-Net that constitutes a common deep network backbone with two output heads subscribing to two diverse configurations.
We derive suitable measures to quantify prediction uncertainty at both pose and joint level.
We present a comprehensive evaluation of the proposed approach and demonstrate state-of-the-art performance on benchmark datasets.
arXiv Detail & Related papers (2022-03-29T07:14:58Z) - A Lightweight Graph Transformer Network for Human Mesh Reconstruction
from 2D Human Pose [8.816462200869445]
We present GTRS, a pose-based method that can reconstruct human mesh from 2D human pose.
We demonstrate the efficiency and generalization of GTRS by extensive evaluations on the Human3.6M and 3DPW datasets.
arXiv Detail & Related papers (2021-11-24T18:48:03Z) - 3D Dense Geometry-Guided Facial Expression Synthesis by Adversarial
Learning [54.24887282693925]
We propose a novel framework to exploit 3D dense (depth and surface normals) information for expression manipulation.
We use an off-the-shelf state-of-the-art 3D reconstruction model to estimate the depth and create a large-scale RGB-Depth dataset.
Our experiments demonstrate that the proposed method outperforms the competitive baseline and existing arts by a large margin.
arXiv Detail & Related papers (2020-09-30T17:12:35Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.