LARNet: Latent Action Representation for Human Action Synthesis
- URL: http://arxiv.org/abs/2110.10899v1
- Date: Thu, 21 Oct 2021 05:04:32 GMT
- Title: LARNet: Latent Action Representation for Human Action Synthesis
- Authors: Naman Biyani, Aayush J Rana, Shruti Vyas, Yogesh S Rawat
- Abstract summary: We present LARNet, a novel end-to-end approach for generating human action videos.
We learn action dynamics in latent space avoiding the need of a driving video during inference.
We evaluate the proposed approach on four real-world human action datasets.
- Score: 3.3454373538792552
- License: http://creativecommons.org/licenses/by-nc-sa/4.0/
- Abstract: We present LARNet, a novel end-to-end approach for generating human action
videos. A joint generative modeling of appearance and dynamics to synthesize a
video is very challenging and therefore recent works in video synthesis have
proposed to decompose these two factors. However, these methods require a
driving video to model the video dynamics. In this work, we propose a
generative approach instead, which explicitly learns action dynamics in latent
space avoiding the need of a driving video during inference. The generated
action dynamics is integrated with the appearance using a recurrent
hierarchical structure which induces motion at different scales to focus on
both coarse as well as fine level action details. In addition, we propose a
novel mix-adversarial loss function which aims at improving the temporal
coherency of synthesized videos. We evaluate the proposed approach on four
real-world human action datasets demonstrating the effectiveness of the
proposed approach in generating human actions. The code and models will be made
publicly available.
Related papers
- iVideoGPT: Interactive VideoGPTs are Scalable World Models [70.02290687442624]
World models empower model-based agents to interactively explore, reason, and plan within imagined environments for real-world decision-making.
This work introduces Interactive VideoGPT, a scalable autoregressive transformer framework that integrates multimodal signals--visual observations, actions, and rewards--into a sequence of tokens.
iVideoGPT features a novel compressive tokenization technique that efficiently discretizes high-dimensional visual observations.
arXiv Detail & Related papers (2024-05-24T05:29:12Z) - NIFTY: Neural Object Interaction Fields for Guided Human Motion
Synthesis [21.650091018774972]
We create a neural interaction field attached to a specific object, which outputs the distance to the valid interaction manifold given a human pose as input.
This interaction field guides the sampling of an object-conditioned human motion diffusion model.
We synthesize realistic motions for sitting and lifting with several objects, outperforming alternative approaches in terms of motion quality and successful action completion.
arXiv Detail & Related papers (2023-07-14T17:59:38Z) - LEO: Generative Latent Image Animator for Human Video Synthesis [38.99490968487773]
We propose a novel framework for human video synthesis, placing emphasis on synthesizing-temporal coherency.
Our key idea is to represent motion as a sequence of flow maps in the generation process, which inherently isolate motion from appearance.
We implement this idea via a flow-based image animator and a Latent Motion Diffusion Model (LMDM)
arXiv Detail & Related papers (2023-05-06T09:29:12Z) - Dance In the Wild: Monocular Human Animation with Neural Dynamic
Appearance Synthesis [56.550999933048075]
We propose a video based synthesis method that tackles challenges and demonstrates high quality results for in-the-wild videos.
We introduce a novel motion signature that is used to modulate the generator weights to capture dynamic appearance changes.
We evaluate our method on a set of challenging videos and show that our approach achieves state-of-the art performance both qualitatively and quantitatively.
arXiv Detail & Related papers (2021-11-10T20:18:57Z) - Pose-guided Generative Adversarial Net for Novel View Action Synthesis [6.019777076722422]
Given an action video, the goal is to generate the same action from an unseen viewpoint.
We propose a novel framework named Pose-guided Action Separable Generative Adversarial Net (PAS-GAN)
We employ a novel local-global spatial transformation module to effectively generate sequential video features in the target view.
arXiv Detail & Related papers (2021-10-15T10:33:09Z) - EAN: Event Adaptive Network for Enhanced Action Recognition [66.81780707955852]
We propose a unified action recognition framework to investigate the dynamic nature of video content.
First, when extracting local cues, we generate the spatial-temporal kernels of dynamic-scale to adaptively fit the diverse events.
Second, to accurately aggregate these cues into a global video representation, we propose to mine the interactions only among a few selected foreground objects by a Transformer.
arXiv Detail & Related papers (2021-07-22T15:57:18Z) - Scene-aware Generative Network for Human Motion Synthesis [125.21079898942347]
We propose a new framework, with the interaction between the scene and the human motion taken into account.
Considering the uncertainty of human motion, we formulate this task as a generative task.
We derive a GAN based learning approach, with discriminators to enforce the compatibility between the human motion and the contextual scene.
arXiv Detail & Related papers (2021-05-31T09:05:50Z) - Dynamic View Synthesis from Dynamic Monocular Video [69.80425724448344]
We present an algorithm for generating views at arbitrary viewpoints and any input time step given a monocular video of a dynamic scene.
We show extensive quantitative and qualitative results of dynamic view synthesis from casually captured videos.
arXiv Detail & Related papers (2021-05-13T17:59:50Z) - Hierarchical Style-based Networks for Motion Synthesis [150.226137503563]
We propose a self-supervised method for generating long-range, diverse and plausible behaviors to achieve a specific goal location.
Our proposed method learns to model the motion of human by decomposing a long-range generation task in a hierarchical manner.
On large-scale skeleton dataset, we show that the proposed method is able to synthesise long-range, diverse and plausible motion.
arXiv Detail & Related papers (2020-08-24T02:11:02Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.