Related papers: Learning Local Recurrent Models for Human Mesh Recovery

Learning Local Recurrent Models for Human Mesh Recovery

URL: http://arxiv.org/abs/2107.12847v1
Date: Tue, 27 Jul 2021 14:30:33 GMT
Title: Learning Local Recurrent Models for Human Mesh Recovery
Authors: Runze Li and Srikrishna Karanam and Ren Li and Terrence Chen and Bir Bhanu and Ziyan Wu
Abstract summary: We present a new method for video mesh recovery that divides the human mesh into several local parts following the standard skeletal model. We then model the dynamics of each local part with separate recurrent models, with each model conditioned appropriately based on the known kinematic structure of the human body. This results in a structure-informed local recurrent learning architecture that can be trained in an end-to-end fashion with available annotations.
Score: 50.85467243778406
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: We consider the problem of estimating frame-level full human body meshes given a video of a person with natural motion dynamics. While much progress in this field has been in single image-based mesh estimation, there has been a recent uptick in efforts to infer mesh dynamics from video given its role in alleviating issues such as depth ambiguity and occlusions. However, a key limitation of existing work is the assumption that all the observed motion dynamics can be modeled using one dynamical/recurrent model. While this may work well in cases with relatively simplistic dynamics, inference with in-the-wild videos presents many challenges. In particular, it is typically the case that different body parts of a person undergo different dynamics in the video, e.g., legs may move in a way that may be dynamically different from hands (e.g., a person dancing). To address these issues, we present a new method for video mesh recovery that divides the human mesh into several local parts following the standard skeletal model. We then model the dynamics of each local part with separate recurrent models, with each model conditioned appropriately based on the known kinematic structure of the human body. This results in a structure-informed local recurrent learning architecture that can be trained in an end-to-end fashion with available annotations. We conduct a variety of experiments on standard video mesh recovery benchmark datasets such as Human3.6M, MPI-INF-3DHP, and 3DPW, demonstrating the efficacy of our design of modeling local dynamics as well as establishing state-of-the-art results based on standard evaluation metrics.

Related papers

PartRM: Modeling Part-Level Dynamics with Large Cross-State Reconstruction Model [23.768571323272152]
PartRM is a novel 4D reconstruction framework that simultaneously models appearance, geometry, and part-level motion from multi-view images of a static object. We introduce the PartDrag-4D dataset, providing multi-view observations of part-level dynamics across over 20,000 states. Experimental results show that PartRM establishes a new state-of-the-art in part-level motion learning and can be applied in manipulation tasks in robotics.
arXiv Detail & Related papers (2025-03-25T17:59:58Z)
Trajectory Optimization for Physics-Based Reconstruction of 3d Human Pose from Monocular Video [31.96672354594643]
We focus on the task of estimating a physically plausible articulated human motion from monocular video. Existing approaches that do not consider physics often produce temporally inconsistent output with motion artifacts. We show that our approach achieves competitive results with respect to existing physics-based methods on the Human3.6M benchmark.
arXiv Detail & Related papers (2022-05-24T18:02:49Z)
Differentiable Dynamics for Articulated 3d Human Motion Reconstruction [29.683633237503116]
We introduce DiffPhy, a differentiable physics-based model for articulated 3d human motion reconstruction from video. We validate the model by demonstrating that it can accurately reconstruct physically plausible 3d human motion from monocular video.
arXiv Detail & Related papers (2022-05-24T17:58:37Z)
Neural Rendering of Humans in Novel View and Pose from Monocular Video [68.37767099240236]
We introduce a new method that generates photo-realistic humans under novel views and poses given a monocular video as input. Our method significantly outperforms existing approaches under unseen poses and novel views given monocular videos as input.
arXiv Detail & Related papers (2022-04-04T03:09:20Z)
Learning Motion-Dependent Appearance for High-Fidelity Rendering of Dynamic Humans from a Single Camera [49.357174195542854]
A key challenge of learning the dynamics of the appearance lies in the requirement of a prohibitively large amount of observations. We show that our method can generate a temporally coherent video of dynamic humans for unseen body poses and novel views given a single view video.
arXiv Detail & Related papers (2022-03-24T00:22:03Z)
LatentHuman: Shape-and-Pose Disentangled Latent Representation for Human Bodies [78.17425779503047]
We propose a novel neural implicit representation for the human body. It is fully differentiable and optimizable with disentangled shape and pose latent spaces. Our model can be trained and fine-tuned directly on non-watertight raw data with well-designed losses.
arXiv Detail & Related papers (2021-11-30T04:10:57Z)
Human Performance Capture from Monocular Video in the Wild [50.34917313325813]
We propose a method capable of capturing the dynamic 3D human shape from a monocular video featuring challenging body poses. Our method outperforms state-of-the-art methods on an in-the-wild human video dataset 3DPW.
arXiv Detail & Related papers (2021-11-29T16:32:41Z)
Dynamics-Regulated Kinematic Policy for Egocentric Pose Estimation [23.603254270514224]
We propose a method for object-aware 3D egocentric pose estimation that tightly integrates kinematics modeling, dynamics modeling, and scene object information. We demonstrate for the first time, the ability to estimate physically-plausible 3D human-object interactions using a single wearable camera.
arXiv Detail & Related papers (2021-06-10T17:59:50Z)
MoCo-Flow: Neural Motion Consensus Flow for Dynamic Humans in Stationary Monocular Cameras [98.40768911788854]
We introduce MoCo-Flow, a representation that models the dynamic scene using a 4D continuous time-variant function. At the heart of our work lies a novel optimization formulation, which is constrained by a motion consensus regularization on the motion flow. We extensively evaluate MoCo-Flow on several datasets that contain human motions of varying complexity.
arXiv Detail & Related papers (2021-06-08T16:03:50Z)

This list is automatically generated from the titles and abstracts of the papers in this site.

This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.