Multi-view Human Body Mesh Translator
- URL: http://arxiv.org/abs/2210.01886v1
- Date: Tue, 4 Oct 2022 20:10:59 GMT
- Title: Multi-view Human Body Mesh Translator
- Authors: Xiangjian Jiang, Xuecheng Nie, Zitian Wang, Luoqi Liu, Si Liu
- Abstract summary: We present a novel textbfMulti-view human body textbfMesh textbfTranslator (MMT) model for estimating human body mesh.
MMT fuses features of different views in both encoding and decoding phases, leading to representations embedded with global information.
- Score: 20.471741894219228
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Existing methods for human mesh recovery mainly focus on single-view
frameworks, but they often fail to produce accurate results due to the
ill-posed setup. Considering the maturity of the multi-view motion capture
system, in this paper, we propose to solve the prior ill-posed problem by
leveraging multiple images from different views, thus significantly enhancing
the quality of recovered meshes. In particular, we present a novel
\textbf{M}ulti-view human body \textbf{M}esh \textbf{T}ranslator (MMT) model
for estimating human body mesh with the help of vision transformer.
Specifically, MMT takes multi-view images as input and translates them to
targeted meshes in a single-forward manner. MMT fuses features of different
views in both encoding and decoding phases, leading to representations embedded
with global information. Additionally, to ensure the tokens are intensively
focused on the human pose and shape, MMT conducts cross-view alignment at the
feature level by projecting 3D keypoint positions to each view and enforcing
their consistency in geometry constraints. Comprehensive experiments
demonstrate that MMT outperforms existing single or multi-view models by a
large margin for human mesh recovery task, notably, 28.8\% improvement in MPVE
over the current state-of-the-art method on the challenging HUMBI dataset.
Qualitative evaluation also verifies the effectiveness of MMT in reconstructing
high-quality human mesh. Codes will be made available upon acceptance.
Related papers
- Divide and Fuse: Body Part Mesh Recovery from Partially Visible Human Images [57.479339658504685]
"Divide and Fuse" strategy reconstructs human body parts independently before fusing them.
Human Part Parametric Models (HPPM) independently reconstruct the mesh from a few shape and global-location parameters.
A specially designed fusion module seamlessly integrates the reconstructed parts, even when only a few are visible.
arXiv Detail & Related papers (2024-07-12T21:29:11Z) - Human Mesh Recovery from Arbitrary Multi-view Images [57.969696744428475]
We propose a divide and conquer framework for Unified Human Mesh Recovery (U-HMR) from arbitrary multi-view images.
In particular, U-HMR consists of a decoupled structure and two main components: camera and body decoupling (CBD), camera pose estimation (CPE) and arbitrary view fusion (AVF)
We conduct extensive experiments on three public datasets: Human3.6M, MPI-INF-3DHP, and TotalCapture.
arXiv Detail & Related papers (2024-03-19T04:47:56Z) - SiMA-Hand: Boosting 3D Hand-Mesh Reconstruction by Single-to-Multi-View
Adaptation [90.59734612754222]
Estimating 3D hand mesh from RGB images is one of the most challenging problems.
Existing attempts towards this task often fail when the occlusion dominates the image space.
We propose SiMA-Hand, aiming to boost the mesh reconstruction performance by Single-to-Multi-view Adaptation.
arXiv Detail & Related papers (2024-02-02T13:14:20Z) - HandMIM: Pose-Aware Self-Supervised Learning for 3D Hand Mesh Estimation [5.888156950854715]
We propose a novel self-supervised pre-training strategy for regressing 3D hand mesh parameters.
Our proposed approach, named HandMIM, achieves strong performance on various hand mesh estimation tasks.
arXiv Detail & Related papers (2023-07-29T19:46:06Z) - HybridMIM: A Hybrid Masked Image Modeling Framework for 3D Medical Image
Segmentation [29.15746532186427]
HybridMIM is a novel hybrid self-supervised learning method based on masked image modeling for 3D medical image segmentation.
We learn the semantic information of medical images at three levels, including:1) partial region prediction to reconstruct key contents of the 3D image, which largely reduces the pre-training time burden.
The proposed framework is versatile to support both CNN and transformer as encoder backbones, and also enables to pre-train decoders for image segmentation.
arXiv Detail & Related papers (2023-03-18T04:43:12Z) - Progressive Multi-view Human Mesh Recovery with Self-Supervision [68.60019434498703]
Existing solutions typically suffer from poor generalization performance to new settings.
We propose a novel simulation-based training pipeline for multi-view human mesh recovery.
arXiv Detail & Related papers (2022-12-10T06:28:29Z) - Direct Multi-view Multi-person 3D Pose Estimation [138.48139701871213]
We present Multi-view Pose transformer (MvP) for estimating multi-person 3D poses from multi-view images.
MvP directly regresses the multi-person 3D poses in a clean and efficient way, without relying on intermediate tasks.
We show experimentally that our MvP model outperforms the state-of-the-art methods on several benchmarks while being much more efficient.
arXiv Detail & Related papers (2021-11-07T13:09:20Z) - Multi-View Matching (MVM): Facilitating Multi-Person 3D Pose Estimation
Learning with Action-Frozen People Video [38.63662549684785]
MVM method generates reliable 3D human poses from a large-scale video dataset.
We train a neural network that takes a single image as the input for multi-person 3D pose estimation.
arXiv Detail & Related papers (2020-04-11T01:09:50Z) - HEMlets PoSh: Learning Part-Centric Heatmap Triplets for 3D Human Pose
and Shape Estimation [60.35776484235304]
This work attempts to address the uncertainty of lifting the detected 2D joints to the 3D space by introducing an intermediate state-Part-Centric Heatmap Triplets (HEMlets)
The HEMlets utilize three joint-heatmaps to represent the relative depth information of the end-joints for each skeletal body part.
A Convolutional Network (ConvNet) is first trained to predict HEMlets from the input image, followed by a volumetric joint-heatmap regression.
arXiv Detail & Related papers (2020-03-10T04:03:45Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.