Related papers: Modelling Latent Dynamics of StyleGAN using Neural ODEs

Modelling Latent Dynamics of StyleGAN using Neural ODEs

URL: http://arxiv.org/abs/2208.11197v2
Date: Sat, 22 Apr 2023 20:18:14 GMT
Title: Modelling Latent Dynamics of StyleGAN using Neural ODEs
Authors: Weihao Xia and Yujiu Yang and Jing-Hao Xue
Abstract summary: We learn the trajectory of independently inverted latent codes from GANs. The learned continuous trajectory allows us to perform infinite frame and consistent video manipulation. Our method achieves state-of-the-art performance but with much less computation.
Score: 52.03496093312985
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: In this paper, we propose to model the video dynamics by learning the trajectory of independently inverted latent codes from GANs. The entire sequence is seen as discrete-time observations of a continuous trajectory of the initial latent code, by considering each latent code as a moving particle and the latent space as a high-dimensional dynamic system. The latent codes representing different frames are therefore reformulated as state transitions of the initial frame, which can be modeled by neural ordinary differential equations. The learned continuous trajectory allows us to perform infinite frame interpolation and consistent video manipulation. The latter task is reintroduced for video editing with the advantage of requiring the core operations to be applied to the first frame only while maintaining temporal consistency across all frames. Extensive experiments demonstrate that our method achieves state-of-the-art performance but with much less computation. Code is available at https://github.com/weihaox/dynode_released.

Related papers

ParticleGS: Particle-Based Dynamics Modeling of 3D Gaussians for Prior-free Motion Extrapolation [9.59448024784555]
We propose a novel dynamic 3D Gaussian Splatting prior-free motion extrapolation framework based on particle dynamics systems.<n>Instead of simply fitting to the observed visual frame sequence, we aim to more effectively model the gaussian particle dynamics system.<n> Experimental results demonstrate that the proposed method achieves comparable rendering quality with existing approaches in reconstruction tasks.
arXiv Detail & Related papers (2025-05-26T17:46:35Z)
Unfolding Videos Dynamics via Taylor Expansion [5.723852805622308]
We present a new self-supervised dynamics learning strategy for videos: Video Time-Differentiation for Instance Discrimination (ViDiDi) ViDiDi observes different aspects of a video through various orders of temporal derivatives of its frame sequence. ViDiDi learns a single neural network that encodes a video and its temporal derivatives into consistent embeddings.
arXiv Detail & Related papers (2024-09-04T01:41:09Z)
VDG: Vision-Only Dynamic Gaussian for Driving Simulation [112.6139608504842]
We introduce self-supervised VO into our pose-free dynamic Gaussian method (VDG) VDG can work with only RGB image input and construct dynamic scenes at a faster speed and larger scenes compared with the pose-free dynamic view-synthesis method. Our results show favorable performance over the state-of-the-art dynamic view synthesis methods.
arXiv Detail & Related papers (2024-06-26T09:29:21Z)
Continuous Learned Primal Dual [10.111901389604423]
We propose the idea that a sequence of layers in a neural network is just a discretisation of an ODE, and thus can be directly modelled by a parameterised ODE. In this work, we explore the use of Neural ODEs for learned inverse problems, in particular with the well-known Learned Primal Dual algorithm, and apply it to computed tomography (CT) reconstruction.
arXiv Detail & Related papers (2024-05-03T20:40:14Z)
RIGID: Recurrent GAN Inversion and Editing of Real Face Videos [73.97520691413006]
GAN inversion is indispensable for applying the powerful editability of GAN to real images. Existing methods invert video frames individually often leading to undesired inconsistent results over time. We propose a unified recurrent framework, named textbfRecurrent vtextbfIdeo textbfGAN textbfInversion and etextbfDiting (RIGID) Our framework learns the inherent coherence between input frames in an end-to-end manner.
arXiv Detail & Related papers (2023-08-11T12:17:24Z)
Towards Smooth Video Composition [59.134911550142455]
Video generation requires consistent and persistent frames with dynamic content over time. This work investigates modeling the temporal relations for composing video with arbitrary length, from a few frames to even infinite, using generative adversarial networks (GANs) We show that the alias-free operation for single image generation, together with adequately pre-learned knowledge, brings a smooth frame transition without compromising the per-frame quality.
arXiv Detail & Related papers (2022-12-14T18:54:13Z)
Continuous-Time Video Generation via Learning Motion Dynamics with Neural ODE [26.13198266911874]
We propose a novel video generation approach that learns separate distributions for motion and appearance. We employ a two-stage approach where the first stage converts a noise vector to a sequence of keypoints in arbitrary frame rates, and the second stage synthesizes videos based on the given keypoints sequence and the appearance noise vector.
arXiv Detail & Related papers (2021-12-21T03:30:38Z)
Simple Video Generation using Neural ODEs [9.303957136142293]
We learn latent variable models that predict the future in latent space and project back to pixels. We show that our approach yields promising results in the task of future frame prediction on the Moving MNIST dataset with 1 and 2 digits.
arXiv Detail & Related papers (2021-09-07T19:03:33Z)
Dynamic View Synthesis from Dynamic Monocular Video [69.80425724448344]
We present an algorithm for generating views at arbitrary viewpoints and any input time step given a monocular video of a dynamic scene. We show extensive quantitative and qualitative results of dynamic view synthesis from casually captured videos.
arXiv Detail & Related papers (2021-05-13T17:59:50Z)
Efficient Semantic Video Segmentation with Per-frame Inference [117.97423110566963]
In this work, we process efficient semantic video segmentation in a per-frame fashion during the inference process. We employ compact models for real-time execution. To narrow the performance gap between compact models and large models, new knowledge distillation methods are designed.
arXiv Detail & Related papers (2020-02-26T12:24:32Z)

This list is automatically generated from the titles and abstracts of the papers in this site.