Related papers: Decoupling Human and Camera Motion from Videos in the Wild

Decoupling Human and Camera Motion from Videos in the Wild

URL: http://arxiv.org/abs/2302.12827v2
Date: Mon, 20 Mar 2023 22:11:45 GMT
Title: Decoupling Human and Camera Motion from Videos in the Wild
Authors: Vickie Ye, Georgios Pavlakos, Jitendra Malik, Angjoo Kanazawa
Abstract summary: We propose a method to reconstruct global human trajectories from videos in the wild. Our method decouples the camera and human motion, which allows us to place people in the same world coordinate frame.
Score: 67.39432972193929
License: http://creativecommons.org/licenses/by/4.0/
Abstract: We propose a method to reconstruct global human trajectories from videos in the wild. Our optimization method decouples the camera and human motion, which allows us to place people in the same world coordinate frame. Most existing methods do not model the camera motion; methods that rely on the background pixels to infer 3D human motion usually require a full scene reconstruction, which is often not possible for in-the-wild videos. However, even when existing SLAM systems cannot recover accurate scene reconstructions, the background pixel motion still provides enough signal to constrain the camera motion. We show that relative camera estimates along with data-driven human motion priors can resolve the scene scale ambiguity and recover global human trajectories. Our method robustly recovers the global 3D trajectories of people in challenging in-the-wild videos, such as PoseTrack. We quantify our improvement over existing methods on 3D human dataset Egobody. We further demonstrate that our recovered camera scale allows us to reason about motion of multiple people in a shared coordinate frame, which improves performance of downstream tracking in PoseTrack. Code and video results can be found at https://vye16.github.io/slahmr.

Related papers

Physics-based Human Pose Estimation from a Single Moving RGB Camera [47.50334809388003]
MoviCam is the first non-synthetic dataset containing ground-truth camera trajectories.<n> PhysDynPose is a physics-based method that incorporates scene geometry and physical constraints.<n>Our method robustly estimates both human and camera poses in world coordinates.
arXiv Detail & Related papers (2025-07-23T11:04:30Z)
Motion Diffusion-Guided 3D Global HMR from a Dynamic Camera [3.6948631725065355]
We present DiffOpt, a novel 3D global HMR method using Diffusion Optimization. Our key insight is that recent advances in human motion generation, such as the motion diffusion model (MDM), contain a strong prior of coherent human motion. We validate DiffOpt with video sequences from the Electromagnetic Database of Global 3D Human Pose and Shape in the Wild.
arXiv Detail & Related papers (2024-11-15T21:09:40Z)
World-Grounded Human Motion Recovery via Gravity-View Coordinates [60.618543026949226]
We propose estimating human poses in a novel Gravity-View coordinate system. The proposed GV system is naturally gravity-aligned and uniquely defined for each video frame. Our method recovers more realistic motion in both the camera space and world-grounded settings, outperforming state-of-the-art methods in both accuracy and speed.
arXiv Detail & Related papers (2024-09-10T17:25:47Z)
HumanVid: Demystifying Training Data for Camera-controllable Human Image Animation [64.37874983401221]
We present HumanVid, the first large-scale high-quality dataset tailored for human image animation. For the real-world data, we compile a vast collection of real-world videos from the internet. For the synthetic data, we collected 10K 3D avatar assets and leveraged existing assets of body shapes, skin textures and clothings.
arXiv Detail & Related papers (2024-07-24T17:15:58Z)
TRAM: Global Trajectory and Motion of 3D Humans from in-the-wild Videos [46.11545135199594]
TRAM robustifies SLAM to recover the camera motion in the presence of dynamic humans. We introduce a video transformer model to regress the kinematic body motion of a human.
arXiv Detail & Related papers (2024-03-26T03:10:45Z)
WHAM: Reconstructing World-grounded Humans with Accurate 3D Motion [43.95997922499137]
WHAM (World-grounded Humans with Accurate Motion) reconstructs 3D human motion in a global coordinate system from video. Uses camera angular velocity estimated from a SLAM method together with human motion to estimate the body's global trajectory. outperforms all existing 3D human motion recovery methods across multiple in-the-wild benchmarks.
arXiv Detail & Related papers (2023-12-12T18:57:46Z)
PACE: Human and Camera Motion Estimation from in-the-wild Videos [113.76041632912577]
We present a method to estimate human motion in a global scene from moving cameras. This is a highly challenging task due to the coupling of human and camera motions in the video. We propose a joint optimization framework that disentangles human and camera motions using both foreground human motion priors and background scene features.
arXiv Detail & Related papers (2023-10-20T19:04:14Z)
TRACE: 5D Temporal Regression of Avatars with Dynamic Cameras in 3D Environments [106.80978555346958]
Current methods can't reliably estimate moving humans in global coordinates. TRACE is the first one-stage method to jointly recover and track 3D humans in global coordinates from dynamic cameras. It achieves state-of-the-art performance on tracking and HPS benchmarks.
arXiv Detail & Related papers (2023-06-05T13:00:44Z)
GLAMR: Global Occlusion-Aware Human Mesh Recovery with Dynamic Cameras [99.07219478953982]
We present an approach for 3D global human mesh recovery from monocular videos recorded with dynamic cameras. We first propose a deep generative motion infiller, which autoregressively infills the body motions of occluded humans based on visible motions. In contrast to prior work, our approach reconstructs human meshes in consistent global coordinates even with dynamic cameras.
arXiv Detail & Related papers (2021-12-02T18:59:54Z)

This list is automatically generated from the titles and abstracts of the papers in this site.