RIGID: Recurrent GAN Inversion and Editing of Real Face Videos
- URL: http://arxiv.org/abs/2308.06097v2
- Date: Tue, 15 Aug 2023 13:34:25 GMT
- Title: RIGID: Recurrent GAN Inversion and Editing of Real Face Videos
- Authors: Yangyang Xu, Shengfeng He, Kwan-Yee K. Wong, Ping Luo
- Abstract summary: GAN inversion is indispensable for applying the powerful editability of GAN to real images.
Existing methods invert video frames individually often leading to undesired inconsistent results over time.
We propose a unified recurrent framework, named textbfRecurrent vtextbfIdeo textbfGAN textbfInversion and etextbfDiting (RIGID)
Our framework learns the inherent coherence between input frames in an end-to-end manner.
- Score: 73.97520691413006
- License: http://creativecommons.org/licenses/by-nc-nd/4.0/
- Abstract: GAN inversion is indispensable for applying the powerful editability of GAN
to real images. However, existing methods invert video frames individually
often leading to undesired inconsistent results over time. In this paper, we
propose a unified recurrent framework, named \textbf{R}ecurrent v\textbf{I}deo
\textbf{G}AN \textbf{I}nversion and e\textbf{D}iting (RIGID), to explicitly and
simultaneously enforce temporally coherent GAN inversion and facial editing of
real videos. Our approach models the temporal relations between current and
previous frames from three aspects. To enable a faithful real video
reconstruction, we first maximize the inversion fidelity and consistency by
learning a temporal compensated latent code. Second, we observe incoherent
noises lie in the high-frequency domain that can be disentangled from the
latent space. Third, to remove the inconsistency after attribute manipulation,
we propose an \textit{in-between frame composition constraint} such that the
arbitrary frame must be a direct composite of its neighboring frames. Our
unified framework learns the inherent coherence between input frames in an
end-to-end manner, and therefore it is agnostic to a specific attribute and can
be applied to arbitrary editing of the same video without re-training.
Extensive experiments demonstrate that RIGID outperforms state-of-the-art
methods qualitatively and quantitatively in both inversion and editing tasks.
The deliverables can be found in \url{https://cnnlstm.github.io/RIGID}
Related papers
- Explorative Inbetweening of Time and Space [46.77750028273578]
We introduce bounded generation to control video generation based only on a given start and end frame.
Time Reversal Fusion fuses the temporally forward and backward denoising paths conditioned on the start and end frame.
We find that Time Reversal Fusion outperforms related work on all subtasks.
arXiv Detail & Related papers (2024-03-21T17:57:31Z) - LatentWarp: Consistent Diffusion Latents for Zero-Shot Video-to-Video
Translation [21.815083817914843]
We propose a new zero-shot video-to-video translation framework, named textitLatentWarp.
Our approach is simple: to constrain the query tokens to be temporally consistent, we further incorporate a warping operation in the latent space.
Experiment results demonstrate the superiority of textitLatentWarp in achieving video-to-video translation with temporal coherence.
arXiv Detail & Related papers (2023-11-01T08:02:57Z) - Rerender A Video: Zero-Shot Text-Guided Video-to-Video Translation [93.18163456287164]
This paper proposes a novel text-guided video-to-video translation framework to adapt image models to videos.
Our framework achieves global style and local texture temporal consistency at a low cost.
arXiv Detail & Related papers (2023-06-13T17:52:23Z) - Transform-Equivariant Consistency Learning for Temporal Sentence
Grounding [66.10949751429781]
We introduce a novel Equivariant Consistency Regulation Learning framework to learn more discriminative representations for each video.
Our motivation comes from that the temporal boundary of the query-guided activity should be consistently predicted.
In particular, we devise a self-supervised consistency loss module to enhance the completeness and smoothness of the augmented video.
arXiv Detail & Related papers (2023-05-06T19:29:28Z) - Towards Smooth Video Composition [59.134911550142455]
Video generation requires consistent and persistent frames with dynamic content over time.
This work investigates modeling the temporal relations for composing video with arbitrary length, from a few frames to even infinite, using generative adversarial networks (GANs)
We show that the alias-free operation for single image generation, together with adequately pre-learned knowledge, brings a smooth frame transition without compromising the per-frame quality.
arXiv Detail & Related papers (2022-12-14T18:54:13Z) - Video Frame Interpolation without Temporal Priors [91.04877640089053]
Video frame aims to synthesize non-exist intermediate frames in a video sequence.
The temporal priors of videos, i.e. frames per second (FPS) and frame exposure time, may vary from different camera sensors.
We devise a novel optical flow refinement strategy for better synthesizing results.
arXiv Detail & Related papers (2021-12-02T12:13:56Z) - From Continuity to Editability: Inverting GANs with Consecutive Images [37.16137384683823]
Existing GAN inversion methods are stuck in a paradox that the inverted codes can either achieve high-fidelity reconstruction, or retain the editing capability.
In this paper, we resolve this paradox by introducing consecutive images into the inversion process.
Our method provides the first support of video-based GAN inversion, and an interesting application of unsupervised semantic transfer from consecutive images.
arXiv Detail & Related papers (2021-07-29T08:19:58Z) - Efficient Semantic Video Segmentation with Per-frame Inference [117.97423110566963]
In this work, we process efficient semantic video segmentation in a per-frame fashion during the inference process.
We employ compact models for real-time execution. To narrow the performance gap between compact models and large models, new knowledge distillation methods are designed.
arXiv Detail & Related papers (2020-02-26T12:24:32Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.