Related papers: THUNDR: Transformer-based 3D HUmaN Reconstruction with Markers

THUNDR: Transformer-based 3D HUmaN Reconstruction with Markers

URL: http://arxiv.org/abs/2106.09336v1
Date: Thu, 17 Jun 2021 09:09:24 GMT
Title: THUNDR: Transformer-based 3D HUmaN Reconstruction with Markers
Authors: Mihai Zanfir, Andrei Zanfir, Eduard Gabriel Bazavan, William T. Freeman, Rahul Sukthankar and Cristian Sminchisescu
Abstract summary: THUNDR is a transformer-based deep neural network methodology to reconstruct the 3d pose and shape of people. We show state-of-the-art results on Human3.6M and 3DPW, for both the fully-supervised and the self-supervised models. We observe very solid 3d reconstruction performance for difficult human poses collected in the wild.
Score: 67.8628917474705
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: We present THUNDR, a transformer-based deep neural network methodology to reconstruct the 3d pose and shape of people, given monocular RGB images. Key to our methodology is an intermediate 3d marker representation, where we aim to combine the predictive power of model-free-output architectures and the regularizing, anthropometrically-preserving properties of a statistical human surface model like GHUM -- a recently introduced, expressive full body statistical 3d human model, trained end-to-end. Our novel transformer-based prediction pipeline can focus on image regions relevant to the task, supports self-supervised regimes, and ensures that solutions are consistent with human anthropometry. We show state-of-the-art results on Human3.6M and 3DPW, for both the fully-supervised and the self-supervised models, for the task of inferring 3d human shape, joint positions, and global translation. Moreover, we observe very solid 3d reconstruction performance for difficult human poses collected in the wild.

Related papers

SIGMAN:Scaling 3D Human Gaussian Generation with Millions of Assets [72.26350984924129]
We propose a latent space generation paradigm for 3D human digitization. We transform the ill-posed low-to-high-dimensional mapping problem into a learnable distribution shift. We employ the multi-view optimization approach combined with synthetic data to construct the HGS-1M dataset.
arXiv Detail & Related papers (2025-04-09T15:38:18Z)
LHM: Large Animatable Human Reconstruction Model from a Single Image in Seconds [21.99354901986186]
We propose LHM (Large Animatable Human Reconstruction Model) to infer high-fidelity avatars represented as 3D Gaussian splatting in a feed-forward pass. Our model leverages a multimodal transformer architecture to effectively encode the human body positional features and image features with attention mechanism. Our LHM generates plausible animatable human in seconds without post-processing for face and hands, outperforming existing methods in both reconstruction accuracy and generalization ability.
arXiv Detail & Related papers (2025-03-13T17:59:21Z)
Neural Localizer Fields for Continuous 3D Human Pose and Shape Estimation [32.30055363306321]
We propose a paradigm for seamlessly unifying different human pose and shape-related tasks and datasets. Our formulation is centered on the ability - both at training and test time - to query any arbitrary point of the human volume. We can naturally exploit differently annotated data sources including mesh, 2D/3D skeleton and dense pose, without having to convert between them.
arXiv Detail & Related papers (2024-07-10T10:44:18Z)
ConvFormer: Parameter Reduction in Transformer Models for 3D Human Pose Estimation by Leveraging Dynamic Multi-Headed Convolutional Attention [0.0]
textbftextitConvFormer is a novel convolutional transformer for the 3D human pose estimation task. We have validated our method on three common benchmark datasets: Human3.6M, MPI-INF-3DHP, and HumanEva.
arXiv Detail & Related papers (2023-04-04T22:23:50Z)
A Modular Multi-stage Lightweight Graph Transformer Network for Human Pose and Shape Estimation from 2D Human Pose [4.598337780022892]
We introduce a pose-based human mesh reconstruction approach that prioritizes computational efficiency without sacrificing reconstruction accuracy. Our method consists of a 2D-to-3D lifter module that utilizes graph transformers to analyze structured and implicit joint correlations in 2D human poses, and a mesh regression module that combines the extracted pose features with a mesh template to produce the final human mesh parameters.
arXiv Detail & Related papers (2023-01-31T04:42:47Z)
LatentHuman: Shape-and-Pose Disentangled Latent Representation for Human Bodies [78.17425779503047]
We propose a novel neural implicit representation for the human body. It is fully differentiable and optimizable with disentangled shape and pose latent spaces. Our model can be trained and fine-tuned directly on non-watertight raw data with well-designed losses.
arXiv Detail & Related papers (2021-11-30T04:10:57Z)
3D Human Pose Estimation with Spatial and Temporal Transformers [59.433208652418976]
We present PoseFormer, a purely transformer-based approach for 3D human pose estimation in videos. Inspired by recent developments in vision transformers, we design a spatial-temporal transformer structure. We quantitatively and qualitatively evaluate our method on two popular and standard benchmark datasets.
arXiv Detail & Related papers (2021-03-18T18:14:37Z)
3D Multi-bodies: Fitting Sets of Plausible 3D Human Models to Ambiguous Image Data [77.57798334776353]
We consider the problem of obtaining dense 3D reconstructions of humans from single and partially occluded views. We suggest that ambiguities can be modelled more effectively by parametrizing the possible body shapes and poses. We show that our method outperforms alternative approaches in ambiguous pose recovery on standard benchmarks for 3D humans.
arXiv Detail & Related papers (2020-11-02T13:55:31Z)
Neural Descent for Visual 3D Human Pose and Shape [67.01050349629053]
We present deep neural network methodology to reconstruct the 3d pose and shape of people, given an input RGB image. We rely on a recently introduced, expressivefull body statistical 3d human model, GHUM, trained end-to-end. Central to our methodology, is a learning to learn and optimize approach, referred to as HUmanNeural Descent (HUND), which avoids both second-order differentiation.
arXiv Detail & Related papers (2020-08-16T13:38:41Z)
PaMIR: Parametric Model-Conditioned Implicit Representation for Image-based Human Reconstruction [67.08350202974434]
We propose Parametric Model-Conditioned Implicit Representation (PaMIR), which combines the parametric body model with the free-form deep implicit function. We show that our method achieves state-of-the-art performance for image-based 3D human reconstruction in the cases of challenging poses and clothing types.
arXiv Detail & Related papers (2020-07-08T02:26:19Z)

This list is automatically generated from the titles and abstracts of the papers in this site.