Motion Capture from Internet Videos
- URL: http://arxiv.org/abs/2008.07931v2
- Date: Wed, 19 Aug 2020 02:56:48 GMT
- Title: Motion Capture from Internet Videos
- Authors: Junting Dong, Qing Shuai, Yuanqing Zhang, Xian Liu, Xiaowei Zhou,
Hujun Bao
- Abstract summary: Recent advances in image-based human pose estimation make it possible to capture 3D human motion from a single RGB video.
While multi-view videos are not common, the videos of a celebrity performing a specific action are usually abundant on the Internet.
We propose a novel optimization-based framework and experimentally demonstrate its ability to recover much more precise and detailed motion from multiple videos.
- Score: 47.943209721329474
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Recent advances in image-based human pose estimation make it possible to
capture 3D human motion from a single RGB video. However, the inherent depth
ambiguity and self-occlusion in a single view prohibit the recovery of as
high-quality motion as multi-view reconstruction. While multi-view videos are
not common, the videos of a celebrity performing a specific action are usually
abundant on the Internet. Even if these videos were recorded at different time
instances, they would encode the same motion characteristics of the person.
Therefore, we propose to capture human motion by jointly analyzing these
Internet videos instead of using single videos separately. However, this new
task poses many new challenges that cannot be addressed by existing methods, as
the videos are unsynchronized, the camera viewpoints are unknown, the
background scenes are different, and the human motions are not exactly the same
among videos. To address these challenges, we propose a novel
optimization-based framework and experimentally demonstrate its ability to
recover much more precise and detailed motion from multiple videos, compared
against monocular motion capture methods.
Related papers
- Cavia: Camera-controllable Multi-view Video Diffusion with View-Integrated Attention [62.2447324481159]
Cavia is a novel framework for camera-controllable, multi-view video generation.
Our framework extends the spatial and temporal attention modules, improving both viewpoint and temporal consistency.
Cavia is the first of its kind that allows the user to specify distinct camera motion while obtaining object motion.
arXiv Detail & Related papers (2024-10-14T17:46:32Z) - ViMo: Generating Motions from Casual Videos [34.19904765033005]
We propose a novel Video-to-Motion-Generation framework (ViMo)
ViMo could leverage the immense trove of untapped video content to produce abundant and diverse 3D human motions.
Striking experimental results demonstrate the proposed model could generate natural motions even for videos where rapid movements, varying perspectives, or frequent occlusions might exist.
arXiv Detail & Related papers (2024-08-13T03:57:35Z) - Total-Recon: Deformable Scene Reconstruction for Embodied View Synthesis [76.72505510632904]
We present Total-Recon, the first method to reconstruct deformable scenes from long monocular RGBD videos.
Our method hierarchically decomposes the scene into the background and objects, whose motion is decomposed into root-body motion and local articulations.
arXiv Detail & Related papers (2023-04-24T17:59:52Z) - Decoupling Human and Camera Motion from Videos in the Wild [67.39432972193929]
We propose a method to reconstruct global human trajectories from videos in the wild.
Our method decouples the camera and human motion, which allows us to place people in the same world coordinate frame.
arXiv Detail & Related papers (2023-02-24T18:59:15Z) - HumanNeRF: Free-viewpoint Rendering of Moving People from Monocular
Video [44.58519508310171]
We introduce a free-viewpoint rendering method -- HumanNeRF -- that works on a given monocular video of a human performing complex body motions.
Our method enables pausing the video at any frame and rendering the subject from arbitrary new camera viewpoints.
arXiv Detail & Related papers (2022-01-11T18:51:21Z) - Do as we do: Multiple Person Video-To-Video Transfer [0.0]
We propose a marker-less approach for multiple-person video-to-video transfer using pose as an intermediate representation.
Given a source video with multiple persons dancing or working out, our method transfers the body motion of all actors to a new set of actors in a different video.
Our method is able to convincingly transfer body motion to the target video, while preserving specific features of the target video, such as feet touching the floor and relative position of the actors.
arXiv Detail & Related papers (2021-04-10T09:26:31Z) - Layered Neural Rendering for Retiming People in Video [108.85428504808318]
We present a method for retiming people in an ordinary, natural video.
We can temporally align different motions, change the speed of certain actions, or "erase" selected people from the video altogether.
A key property of our model is that it not only disentangles the direct motions of each person in the input video, but also correlates each person automatically with the scene changes they generate.
arXiv Detail & Related papers (2020-09-16T17:48:26Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.