GLAMR: Global Occlusion-Aware Human Mesh Recovery with Dynamic Cameras
- URL: http://arxiv.org/abs/2112.01524v1
- Date: Thu, 2 Dec 2021 18:59:54 GMT
- Title: GLAMR: Global Occlusion-Aware Human Mesh Recovery with Dynamic Cameras
- Authors: Ye Yuan, Umar Iqbal, Pavlo Molchanov, Kris Kitani, Jan Kautz
- Abstract summary: We present an approach for 3D global human mesh recovery from monocular videos recorded with dynamic cameras.
We first propose a deep generative motion infiller, which autoregressively infills the body motions of occluded humans based on visible motions.
In contrast to prior work, our approach reconstructs human meshes in consistent global coordinates even with dynamic cameras.
- Score: 99.07219478953982
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: We present an approach for 3D global human mesh recovery from monocular
videos recorded with dynamic cameras. Our approach is robust to severe and
long-term occlusions and tracks human bodies even when they go outside the
camera's field of view. To achieve this, we first propose a deep generative
motion infiller, which autoregressively infills the body motions of occluded
humans based on visible motions. Additionally, in contrast to prior work, our
approach reconstructs human meshes in consistent global coordinates even with
dynamic cameras. Since the joint reconstruction of human motions and camera
poses is underconstrained, we propose a global trajectory predictor that
generates global human trajectories based on local body movements. Using the
predicted trajectories as anchors, we present a global optimization framework
that refines the predicted trajectories and optimizes the camera poses to match
the video evidence such as 2D keypoints. Experiments on challenging indoor and
in-the-wild datasets with dynamic cameras demonstrate that the proposed
approach outperforms prior methods significantly in terms of motion infilling
and global mesh recovery.
Related papers
- World-Grounded Human Motion Recovery via Gravity-View Coordinates [60.618543026949226]
We propose estimating human poses in a novel Gravity-View coordinate system.
The proposed GV system is naturally gravity-aligned and uniquely defined for each video frame.
Our method recovers more realistic motion in both the camera space and world-grounded settings, outperforming state-of-the-art methods in both accuracy and speed.
arXiv Detail & Related papers (2024-09-10T17:25:47Z) - COIN: Control-Inpainting Diffusion Prior for Human and Camera Motion Estimation [98.05046790227561]
COIN is a control-inpainting motion diffusion prior that enables fine-grained control to disentangle human and camera motions.
COIN outperforms the state-of-the-art methods in terms of global human motion estimation and camera motion estimation.
arXiv Detail & Related papers (2024-08-29T10:36:29Z) - WHAC: World-grounded Humans and Cameras [37.877565981937586]
We aim to recover expressive parametric human models (i.e., SMPL-X) and corresponding camera poses jointly.
We introduce a novel framework, referred to as WHAC, to facilitate world-grounded expressive human pose and shape estimation.
We present a new synthetic dataset, WHAC-A-Mole, which includes accurately annotated humans and cameras.
arXiv Detail & Related papers (2024-03-19T17:58:02Z) - PACE: Human and Camera Motion Estimation from in-the-wild Videos [113.76041632912577]
We present a method to estimate human motion in a global scene from moving cameras.
This is a highly challenging task due to the coupling of human and camera motions in the video.
We propose a joint optimization framework that disentangles human and camera motions using both foreground human motion priors and background scene features.
arXiv Detail & Related papers (2023-10-20T19:04:14Z) - Decoupling Human and Camera Motion from Videos in the Wild [67.39432972193929]
We propose a method to reconstruct global human trajectories from videos in the wild.
Our method decouples the camera and human motion, which allows us to place people in the same world coordinate frame.
arXiv Detail & Related papers (2023-02-24T18:59:15Z) - Camera Motion Agnostic 3D Human Pose Estimation [8.090223360924004]
This paper presents a camera motion agnostic approach for predicting 3D human pose and mesh defined in the world coordinate system.
We propose a network based on bidirectional gated recurrent units (GRUs) that predicts the global motion sequence from the local pose sequence.
We use 3DPW and synthetic datasets, which are constructed in a moving-camera environment, for evaluation.
arXiv Detail & Related papers (2021-12-01T08:22:50Z) - Task-Generic Hierarchical Human Motion Prior using VAEs [44.356707509079044]
A deep generative model that describes human motions can benefit a wide range of fundamental computer vision and graphics tasks.
We present a method for learning complex human motions independent of specific tasks using a combined global and local latent space.
We demonstrate the effectiveness of our hierarchical motion variational autoencoder in a variety of tasks including video-based human pose estimation.
arXiv Detail & Related papers (2021-06-07T23:11:42Z) - Estimating Egocentric 3D Human Pose in Global Space [70.7272154474722]
We present a new method for egocentric global 3D body pose estimation using a single-mounted fisheye camera.
Our approach outperforms state-of-the-art methods both quantitatively and qualitatively.
arXiv Detail & Related papers (2021-04-27T20:01:57Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.