Decoupling Human and Camera Motion from Videos in the Wild
- URL: http://arxiv.org/abs/2302.12827v2
- Date: Mon, 20 Mar 2023 22:11:45 GMT
- Title: Decoupling Human and Camera Motion from Videos in the Wild
- Authors: Vickie Ye, Georgios Pavlakos, Jitendra Malik, Angjoo Kanazawa
- Abstract summary: We propose a method to reconstruct global human trajectories from videos in the wild.
Our method decouples the camera and human motion, which allows us to place people in the same world coordinate frame.
- Score: 67.39432972193929
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: We propose a method to reconstruct global human trajectories from videos in
the wild. Our optimization method decouples the camera and human motion, which
allows us to place people in the same world coordinate frame. Most existing
methods do not model the camera motion; methods that rely on the background
pixels to infer 3D human motion usually require a full scene reconstruction,
which is often not possible for in-the-wild videos. However, even when existing
SLAM systems cannot recover accurate scene reconstructions, the background
pixel motion still provides enough signal to constrain the camera motion. We
show that relative camera estimates along with data-driven human motion priors
can resolve the scene scale ambiguity and recover global human trajectories.
Our method robustly recovers the global 3D trajectories of people in
challenging in-the-wild videos, such as PoseTrack. We quantify our improvement
over existing methods on 3D human dataset Egobody. We further demonstrate that
our recovered camera scale allows us to reason about motion of multiple people
in a shared coordinate frame, which improves performance of downstream tracking
in PoseTrack. Code and video results can be found at
https://vye16.github.io/slahmr.
Related papers
- Motion Diffusion-Guided 3D Global HMR from a Dynamic Camera [3.6948631725065355]
We present DiffOpt, a novel 3D global HMR method using Diffusion Optimization.
Our key insight is that recent advances in human motion generation, such as the motion diffusion model (MDM), contain a strong prior of coherent human motion.
We validate DiffOpt with video sequences from the Electromagnetic Database of Global 3D Human Pose and Shape in the Wild.
arXiv Detail & Related papers (2024-11-15T21:09:40Z) - World-Grounded Human Motion Recovery via Gravity-View Coordinates [60.618543026949226]
We propose estimating human poses in a novel Gravity-View coordinate system.
The proposed GV system is naturally gravity-aligned and uniquely defined for each video frame.
Our method recovers more realistic motion in both the camera space and world-grounded settings, outperforming state-of-the-art methods in both accuracy and speed.
arXiv Detail & Related papers (2024-09-10T17:25:47Z) - HumanVid: Demystifying Training Data for Camera-controllable Human Image Animation [64.37874983401221]
We present HumanVid, the first large-scale high-quality dataset tailored for human image animation.
For the real-world data, we compile a vast collection of real-world videos from the internet.
For the synthetic data, we collected 10K 3D avatar assets and leveraged existing assets of body shapes, skin textures and clothings.
arXiv Detail & Related papers (2024-07-24T17:15:58Z) - TRAM: Global Trajectory and Motion of 3D Humans from in-the-wild Videos [46.11545135199594]
TRAM robustifies SLAM to recover the camera motion in the presence of dynamic humans.
We introduce a video transformer model to regress the kinematic body motion of a human.
arXiv Detail & Related papers (2024-03-26T03:10:45Z) - WHAM: Reconstructing World-grounded Humans with Accurate 3D Motion [43.95997922499137]
WHAM (World-grounded Humans with Accurate Motion) reconstructs 3D human motion in a global coordinate system from video.
Uses camera angular velocity estimated from a SLAM method together with human motion to estimate the body's global trajectory.
outperforms all existing 3D human motion recovery methods across multiple in-the-wild benchmarks.
arXiv Detail & Related papers (2023-12-12T18:57:46Z) - PACE: Human and Camera Motion Estimation from in-the-wild Videos [113.76041632912577]
We present a method to estimate human motion in a global scene from moving cameras.
This is a highly challenging task due to the coupling of human and camera motions in the video.
We propose a joint optimization framework that disentangles human and camera motions using both foreground human motion priors and background scene features.
arXiv Detail & Related papers (2023-10-20T19:04:14Z) - TRACE: 5D Temporal Regression of Avatars with Dynamic Cameras in 3D
Environments [106.80978555346958]
Current methods can't reliably estimate moving humans in global coordinates.
TRACE is the first one-stage method to jointly recover and track 3D humans in global coordinates from dynamic cameras.
It achieves state-of-the-art performance on tracking and HPS benchmarks.
arXiv Detail & Related papers (2023-06-05T13:00:44Z) - GLAMR: Global Occlusion-Aware Human Mesh Recovery with Dynamic Cameras [99.07219478953982]
We present an approach for 3D global human mesh recovery from monocular videos recorded with dynamic cameras.
We first propose a deep generative motion infiller, which autoregressively infills the body motions of occluded humans based on visible motions.
In contrast to prior work, our approach reconstructs human meshes in consistent global coordinates even with dynamic cameras.
arXiv Detail & Related papers (2021-12-02T18:59:54Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.