LARNet: Latent Action Representation for Human Action Synthesis
- URL: http://arxiv.org/abs/2110.10899v1
- Date: Thu, 21 Oct 2021 05:04:32 GMT
- Title: LARNet: Latent Action Representation for Human Action Synthesis
- Authors: Naman Biyani, Aayush J Rana, Shruti Vyas, Yogesh S Rawat
- Abstract summary: We present LARNet, a novel end-to-end approach for generating human action videos.
We learn action dynamics in latent space avoiding the need of a driving video during inference.
We evaluate the proposed approach on four real-world human action datasets.
- Score: 3.3454373538792552
- License: http://creativecommons.org/licenses/by-nc-sa/4.0/
- Abstract: We present LARNet, a novel end-to-end approach for generating human action
videos. A joint generative modeling of appearance and dynamics to synthesize a
video is very challenging and therefore recent works in video synthesis have
proposed to decompose these two factors. However, these methods require a
driving video to model the video dynamics. In this work, we propose a
generative approach instead, which explicitly learns action dynamics in latent
space avoiding the need of a driving video during inference. The generated
action dynamics is integrated with the appearance using a recurrent
hierarchical structure which induces motion at different scales to focus on
both coarse as well as fine level action details. In addition, we propose a
novel mix-adversarial loss function which aims at improving the temporal
coherency of synthesized videos. We evaluate the proposed approach on four
real-world human action datasets demonstrating the effectiveness of the
proposed approach in generating human actions. The code and models will be made
publicly available.
Related papers
- VideoJAM: Joint Appearance-Motion Representations for Enhanced Motion Generation in Video Models [71.9811050853964]
VideoJAM is a novel framework that instills an effective motion prior to video generators.
VideoJAM achieves state-of-the-art performance in motion coherence.
These findings emphasize that appearance and motion can be complementary and, when effectively integrated, enhance both the visual quality and the coherence of video generation.
arXiv Detail & Related papers (2025-02-04T17:07:10Z) - Move-in-2D: 2D-Conditioned Human Motion Generation [54.067588636155115]
We propose Move-in-2D, a novel approach to generate human motion sequences conditioned on a scene image.
Our approach accepts both a scene image and text prompt as inputs, producing a motion sequence tailored to the scene.
arXiv Detail & Related papers (2024-12-17T18:58:07Z) - InterDyn: Controllable Interactive Dynamics with Video Diffusion Models [50.38647583839384]
We propose InterDyn, a framework that generates videos of interactive dynamics given an initial frame and a control signal encoding the motion of a driving object or actor.
Our key insight is that large video foundation models can act as both neurals and implicit physics simulators by learning interactive dynamics from large-scale video data.
arXiv Detail & Related papers (2024-12-16T13:57:02Z) - LEO: Generative Latent Image Animator for Human Video Synthesis [38.99490968487773]
We propose a novel framework for human video synthesis, placing emphasis on synthesizing-temporal coherency.
Our key idea is to represent motion as a sequence of flow maps in the generation process, which inherently isolate motion from appearance.
We implement this idea via a flow-based image animator and a Latent Motion Diffusion Model (LMDM)
arXiv Detail & Related papers (2023-05-06T09:29:12Z) - Dance In the Wild: Monocular Human Animation with Neural Dynamic
Appearance Synthesis [56.550999933048075]
We propose a video based synthesis method that tackles challenges and demonstrates high quality results for in-the-wild videos.
We introduce a novel motion signature that is used to modulate the generator weights to capture dynamic appearance changes.
We evaluate our method on a set of challenging videos and show that our approach achieves state-of-the art performance both qualitatively and quantitatively.
arXiv Detail & Related papers (2021-11-10T20:18:57Z) - Pose-guided Generative Adversarial Net for Novel View Action Synthesis [6.019777076722422]
Given an action video, the goal is to generate the same action from an unseen viewpoint.
We propose a novel framework named Pose-guided Action Separable Generative Adversarial Net (PAS-GAN)
We employ a novel local-global spatial transformation module to effectively generate sequential video features in the target view.
arXiv Detail & Related papers (2021-10-15T10:33:09Z) - Dynamic View Synthesis from Dynamic Monocular Video [69.80425724448344]
We present an algorithm for generating views at arbitrary viewpoints and any input time step given a monocular video of a dynamic scene.
We show extensive quantitative and qualitative results of dynamic view synthesis from casually captured videos.
arXiv Detail & Related papers (2021-05-13T17:59:50Z) - Hierarchical Style-based Networks for Motion Synthesis [150.226137503563]
We propose a self-supervised method for generating long-range, diverse and plausible behaviors to achieve a specific goal location.
Our proposed method learns to model the motion of human by decomposing a long-range generation task in a hierarchical manner.
On large-scale skeleton dataset, we show that the proposed method is able to synthesise long-range, diverse and plausible motion.
arXiv Detail & Related papers (2020-08-24T02:11:02Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.