Pose-guided Generative Adversarial Net for Novel View Action Synthesis
- URL: http://arxiv.org/abs/2110.07993v1
- Date: Fri, 15 Oct 2021 10:33:09 GMT
- Title: Pose-guided Generative Adversarial Net for Novel View Action Synthesis
- Authors: Xianhang Li, Junhao Zhang, Kunchang Li, Shruti Vyas, Yogesh S Rawat
- Abstract summary: Given an action video, the goal is to generate the same action from an unseen viewpoint.
We propose a novel framework named Pose-guided Action Separable Generative Adversarial Net (PAS-GAN)
We employ a novel local-global spatial transformation module to effectively generate sequential video features in the target view.
- Score: 6.019777076722422
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: We focus on the problem of novel-view human action synthesis. Given an action
video, the goal is to generate the same action from an unseen viewpoint.
Naturally, novel view video synthesis is more challenging than image synthesis.
It requires the synthesis of a sequence of realistic frames with temporal
coherency. Besides, transferring the different actions to a novel target view
requires awareness of action category and viewpoint change simultaneously. To
address these challenges, we propose a novel framework named Pose-guided Action
Separable Generative Adversarial Net (PAS-GAN), which utilizes pose to
alleviate the difficulty of this task. First, we propose a recurrent
pose-transformation module which transforms actions from the source view to the
target view and generates novel view pose sequence in 2D coordinate space.
Second, a well-transformed pose sequence enables us to separatethe action and
background in the target view. We employ a novel local-global spatial
transformation module to effectively generate sequential video features in the
target view using these action and background features. Finally, the generated
video features are used to synthesize human action with the help of a 3D
decoder. Moreover, to focus on dynamic action in the video, we propose a novel
multi-scale action-separable loss which further improves the video quality. We
conduct extensive experiments on two large-scale multi-view human action
datasets, NTU-RGBD and PKU-MMD, demonstrating the effectiveness of PAS-GAN
which outperforms existing approaches.
Related papers
- ViewCrafter: Taming Video Diffusion Models for High-fidelity Novel View Synthesis [63.169364481672915]
We propose textbfViewCrafter, a novel method for synthesizing high-fidelity novel views of generic scenes from single or sparse images.
Our method takes advantage of the powerful generation capabilities of video diffusion model and the coarse 3D clues offered by point-based representation to generate high-quality video frames.
arXiv Detail & Related papers (2024-09-03T16:53:19Z) - ViVid-1-to-3: Novel View Synthesis with Video Diffusion Models [33.760292331843104]
Generating novel views of an object from a single image is a challenging task.
Recent methods for view synthesis based on diffusion have shown great progress.
We demonstrate a simple method, where we utilize a pre-trained video diffusion model.
arXiv Detail & Related papers (2023-12-03T06:50:15Z) - LEO: Generative Latent Image Animator for Human Video Synthesis [38.99490968487773]
We propose a novel framework for human video synthesis, placing emphasis on synthesizing-temporal coherency.
Our key idea is to represent motion as a sequence of flow maps in the generation process, which inherently isolate motion from appearance.
We implement this idea via a flow-based image animator and a Latent Motion Diffusion Model (LMDM)
arXiv Detail & Related papers (2023-05-06T09:29:12Z) - Consistent View Synthesis with Pose-Guided Diffusion Models [51.37925069307313]
Novel view synthesis from a single image has been a cornerstone problem for many Virtual Reality applications.
We propose a pose-guided diffusion model to generate a consistent long-term video of novel views from a single image.
arXiv Detail & Related papers (2023-03-30T17:59:22Z) - Towards Diverse and Natural Scene-aware 3D Human Motion Synthesis [117.15586710830489]
We focus on the problem of synthesizing diverse scene-aware human motions under the guidance of target action sequences.
Based on this factorized scheme, a hierarchical framework is proposed, with each sub-module responsible for modeling one aspect.
Experiment results show that the proposed framework remarkably outperforms previous methods in terms of diversity and naturalness.
arXiv Detail & Related papers (2022-05-25T18:20:01Z) - MoCaNet: Motion Retargeting in-the-wild via Canonicalization Networks [77.56526918859345]
We present a novel framework that brings the 3D motion task from controlled environments to in-the-wild scenarios.
It is capable of body motion from a character in a 2D monocular video to a 3D character without using any motion capture system or 3D reconstruction procedure.
arXiv Detail & Related papers (2021-12-19T07:52:05Z) - Dance In the Wild: Monocular Human Animation with Neural Dynamic
Appearance Synthesis [56.550999933048075]
We propose a video based synthesis method that tackles challenges and demonstrates high quality results for in-the-wild videos.
We introduce a novel motion signature that is used to modulate the generator weights to capture dynamic appearance changes.
We evaluate our method on a set of challenging videos and show that our approach achieves state-of-the art performance both qualitatively and quantitatively.
arXiv Detail & Related papers (2021-11-10T20:18:57Z) - LARNet: Latent Action Representation for Human Action Synthesis [3.3454373538792552]
We present LARNet, a novel end-to-end approach for generating human action videos.
We learn action dynamics in latent space avoiding the need of a driving video during inference.
We evaluate the proposed approach on four real-world human action datasets.
arXiv Detail & Related papers (2021-10-21T05:04:32Z) - Compositional Video Synthesis with Action Graphs [112.94651460161992]
Videos of actions are complex signals containing rich compositional structure in space and time.
We propose to represent the actions in a graph structure called Action Graph and present the new Action Graph To Video'' synthesis task.
Our generative model for this task (AG2Vid) disentangles motion and appearance features, and by incorporating a scheduling mechanism for actions facilitates a timely and coordinated video generation.
arXiv Detail & Related papers (2020-06-27T09:39:04Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.