On Development and Evaluation of Retargeting Human Motion and Appearance
in Monocular Videos
- URL: http://arxiv.org/abs/2103.15596v1
- Date: Mon, 29 Mar 2021 13:17:41 GMT
- Title: On Development and Evaluation of Retargeting Human Motion and Appearance
in Monocular Videos
- Authors: Thiago L. Gomes and Renato Martins and Jo\~ao Ferreira and Rafael
Azevedo and Guilherme Torres and Erickson R. Nascimento
- Abstract summary: Transferring human motion and appearance between videos of human actors remains one of the key challenges in Computer Vision.
We propose a novel and high-performant approach based on a hybrid image-based rendering technique that exhibits competitive visual quality.
We also present a new video benchmark dataset composed of different videos with annotated human motions to evaluate the task of synthesizing people's videos.
- Score: 2.870762512009438
- License: http://creativecommons.org/licenses/by-nc-nd/4.0/
- Abstract: Transferring human motion and appearance between videos of human actors
remains one of the key challenges in Computer Vision. Despite the advances from
recent image-to-image translation approaches, there are several transferring
contexts where most end-to-end learning-based retargeting methods still perform
poorly. Transferring human appearance from one actor to another is only ensured
when a strict setup has been complied, which is generally built considering
their training regime's specificities. The contribution of this paper is
two-fold: first, we propose a novel and high-performant approach based on a
hybrid image-based rendering technique that exhibits competitive visual
retargeting quality compared to state-of-the-art neural rendering approaches.
The formulation leverages user body shape into the retargeting while
considering physical constraints of the motion in 3D and the 2D image domain.
We also present a new video retargeting benchmark dataset composed of different
videos with annotated human motions to evaluate the task of synthesizing
people's videos, which can be used as a common base to improve tracking the
progress in the field. The dataset and its evaluation protocols are designed to
evaluate retargeting methods in more general and challenging conditions. Our
method is validated in several experiments, comprising publicly available
videos of actors with different shapes, motion types and camera setups. The
dataset and retargeting code are publicly available to the community at:
https://www.verlab.dcc.ufmg.br/retargeting-motion.
Related papers
- MultiPly: Reconstruction of Multiple People from Monocular Video in the Wild [32.6521941706907]
We present MultiPly, a novel framework to reconstruct multiple people in 3D from monocular in-the-wild videos.
We first define a layered neural representation for the entire scene, composited by individual human and background models.
We learn the layered neural representation from videos via our layer-wise differentiable volume rendering.
arXiv Detail & Related papers (2024-06-03T17:59:57Z) - VLOGGER: Multimodal Diffusion for Embodied Avatar Synthesis [40.869862603815875]
VLOGGER is a method for audio-driven human video generation from a single input image.
We use a novel diffusion-based architecture that augments text-to-image models with both spatial and temporal controls.
We show applications in video editing and personalization.
arXiv Detail & Related papers (2024-03-13T17:59:02Z) - Humans in 4D: Reconstructing and Tracking Humans with Transformers [72.50856500760352]
We present an approach to reconstruct humans and track them over time.
At the core of our approach, we propose a fully "transformerized" version of a network for human mesh recovery.
This network, HMR 2.0, advances the state of the art and shows the capability to analyze unusual poses that have in the past been difficult to reconstruct from single images.
arXiv Detail & Related papers (2023-05-31T17:59:52Z) - Neural Rendering of Humans in Novel View and Pose from Monocular Video [68.37767099240236]
We introduce a new method that generates photo-realistic humans under novel views and poses given a monocular video as input.
Our method significantly outperforms existing approaches under unseen poses and novel views given monocular videos as input.
arXiv Detail & Related papers (2022-04-04T03:09:20Z) - Dance In the Wild: Monocular Human Animation with Neural Dynamic
Appearance Synthesis [56.550999933048075]
We propose a video based synthesis method that tackles challenges and demonstrates high quality results for in-the-wild videos.
We introduce a novel motion signature that is used to modulate the generator weights to capture dynamic appearance changes.
We evaluate our method on a set of challenging videos and show that our approach achieves state-of-the art performance both qualitatively and quantitatively.
arXiv Detail & Related papers (2021-11-10T20:18:57Z) - Image Comes Dancing with Collaborative Parsing-Flow Video Synthesis [124.48519390371636]
Transfering human motion from a source to a target person poses great potential in computer vision and graphics applications.
Previous work has either relied on crafted 3D human models or trained a separate model specifically for each target person.
This work studies a more general setting, in which we aim to learn a single model to parsimoniously transfer motion from a source video to any target person.
arXiv Detail & Related papers (2021-10-27T03:42:41Z) - JOKR: Joint Keypoint Representation for Unsupervised Cross-Domain Motion
Retargeting [53.28477676794658]
unsupervised motion in videos has seen substantial advancements through the use of deep neural networks.
We introduce JOKR - a JOint Keypoint Representation that handles both the source and target videos, without requiring any object prior or data collection.
We evaluate our method both qualitatively and quantitatively, and demonstrate that our method handles various cross-domain scenarios, such as different animals, different flowers, and humans.
arXiv Detail & Related papers (2021-06-17T17:32:32Z) - Flow Guided Transformable Bottleneck Networks for Motion Retargeting [29.16125343915916]
Existing efforts leverage a long training video from each target person to train a subject-specific motion transfer model.
Few-shot motion transfer techniques, which only require one or a few images from a target, have recently drawn considerable attention.
Inspired by the Transformable Bottleneck Network, we propose an approach based on an implicit volumetric representation of the image content.
arXiv Detail & Related papers (2021-06-14T21:58:30Z) - Human Motion Transfer from Poses in the Wild [61.6016458288803]
We tackle the problem of human motion transfer, where we synthesize novel motion video for a target person that imitates the movement from a reference video.
It is a video-to-video translation task in which the estimated poses are used to bridge two domains.
We introduce a novel pose-to-video translation framework for generating high-quality videos that are temporally coherent even for in-the-wild pose sequences unseen during training.
arXiv Detail & Related papers (2020-04-07T05:59:53Z) - Do As I Do: Transferring Human Motion and Appearance between Monocular
Videos with Spatial and Temporal Constraints [8.784162652042959]
Marker-less human motion estimation and shape modeling from images in the wild bring this challenge to the fore.
We propose a unifying formulation for transferring appearance and human motion from monocular videos.
Our method is able to transfer both human motion and appearance outperforming state-of-the-art methods.
arXiv Detail & Related papers (2020-01-08T16:39:16Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.