Leveraging MoCap Data for Human Mesh Recovery
- URL: http://arxiv.org/abs/2110.09243v1
- Date: Mon, 18 Oct 2021 12:43:00 GMT
- Title: Leveraging MoCap Data for Human Mesh Recovery
- Authors: Fabien Baradel, Thibault Groueix, Philippe Weinzaepfel, Romain
Br\'egier, Yannis Kalantidis, Gr\'egory Rogez
- Abstract summary: We study whether poses from 3D Motion Capture (MoCap) data can be used to improve image-based and video-based human mesh recovery methods.
We find that fine-tune image-based models with synthetic renderings from MoCap data can increase their performance.
We introduce PoseBERT, a transformer module that directly regresses the pose parameters and is trained via masked modeling.
- Score: 27.76352018682937
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Training state-of-the-art models for human body pose and shape recovery from
images or videos requires datasets with corresponding annotations that are
really hard and expensive to obtain. Our goal in this paper is to study whether
poses from 3D Motion Capture (MoCap) data can be used to improve image-based
and video-based human mesh recovery methods. We find that fine-tune image-based
models with synthetic renderings from MoCap data can increase their
performance, by providing them with a wider variety of poses, textures and
backgrounds. In fact, we show that simply fine-tuning the batch normalization
layers of the model is enough to achieve large gains. We further study the use
of MoCap data for video, and introduce PoseBERT, a transformer module that
directly regresses the pose parameters and is trained via masked modeling. It
is simple, generic and can be plugged on top of any state-of-the-art
image-based model in order to transform it in a video-based model leveraging
temporal information. Our experimental results show that the proposed
approaches reach state-of-the-art performance on various datasets including
3DPW, MPI-INF-3DHP, MuPoTS-3D, MCB and AIST. Test code and models will be
available soon.
Related papers
- DEEM: Diffusion Models Serve as the Eyes of Large Language Models for Image Perception [66.88792390480343]
We propose DEEM, a simple and effective approach that utilizes the generative feedback of diffusion models to align the semantic distributions of the image encoder.
DEEM exhibits enhanced robustness and a superior capacity to alleviate hallucinations while utilizing fewer trainable parameters, less pre-training data, and a smaller base model size.
arXiv Detail & Related papers (2024-05-24T05:46:04Z) - 3D Human Reconstruction in the Wild with Synthetic Data Using Generative Models [52.96248836582542]
We propose an effective approach based on recent diffusion models, termed HumanWild, which can effortlessly generate human images and corresponding 3D mesh annotations.
By exclusively employing generative models, we generate large-scale in-the-wild human images and high-quality annotations, eliminating the need for real-world data collection.
arXiv Detail & Related papers (2024-03-17T06:31:16Z) - Stable Video Diffusion: Scaling Latent Video Diffusion Models to Large
Datasets [36.95521842177614]
We present Stable Video Diffusion - a latent video diffusion model for high-resolution, state-of-the-art text-to-video and image-to-video generation.
We identify and evaluate three different stages for successful training of video LDMs: text-to-image pretraining, video pretraining, and high-quality video finetuning.
arXiv Detail & Related papers (2023-11-25T22:28:38Z) - LRM: Large Reconstruction Model for Single Image to 3D [61.47357798633123]
We propose the first Large Reconstruction Model (LRM) that predicts the 3D model of an object from a single input image within just 5 seconds.
LRM adopts a highly scalable transformer-based architecture with 500 million learnable parameters to directly predict a neural radiance field (NeRF) from the input image.
We train our model in an end-to-end manner on massive multi-view data containing around 1 million objects.
arXiv Detail & Related papers (2023-11-08T00:03:52Z) - PoseBERT: A Generic Transformer Module for Temporal 3D Human Modeling [23.420076136028687]
PoseBERT is a transformer module that is fully trained on 3D Motion Capture data via masked modeling.
It is simple, generic and versatile, as it can be plugged on top of any image-based model to transform it in a video-based model.
Our experimental results validate that adding PoseBERT on top of various state-of-the-art pose estimation methods consistently improves their performances.
arXiv Detail & Related papers (2022-08-22T11:30:14Z) - Human Performance Capture from Monocular Video in the Wild [50.34917313325813]
We propose a method capable of capturing the dynamic 3D human shape from a monocular video featuring challenging body poses.
Our method outperforms state-of-the-art methods on an in-the-wild human video dataset 3DPW.
arXiv Detail & Related papers (2021-11-29T16:32:41Z) - ViViT: A Video Vision Transformer [75.74690759089529]
We present pure-transformer based models for video classification.
Our model extracts-temporal tokens from the input video, which are then encoded by a series of transformer layers.
We show how we can effectively regularise the model during training and leverage pretrained image models to be able to train on comparatively small datasets.
arXiv Detail & Related papers (2021-03-29T15:27:17Z) - Monocular Real-time Hand Shape and Motion Capture using Multi-modal Data [77.34069717612493]
We present a novel method for monocular hand shape and pose estimation at unprecedented runtime performance of 100fps.
This is enabled by a new learning based architecture designed such that it can make use of all the sources of available hand training data.
It features a 3D hand joint detection module and an inverse kinematics module which regresses not only 3D joint positions but also maps them to joint rotations in a single feed-forward pass.
arXiv Detail & Related papers (2020-03-21T03:51:54Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.