Ada-VE: Training-Free Consistent Video Editing Using Adaptive Motion Prior
- URL: http://arxiv.org/abs/2406.04873v2
- Date: Sun, 10 Nov 2024 10:08:37 GMT
- Title: Ada-VE: Training-Free Consistent Video Editing Using Adaptive Motion Prior
- Authors: Tanvir Mahmud, Mustafa Munir, Radu Marculescu, Diana Marculescu,
- Abstract summary: Video-to-video synthesis poses significant challenges in maintaining character consistency, smooth temporal transitions, and preserving visual quality during fast motion.
We propose an adaptive motion-guided cross-frame attention mechanism that selectively reduces redundant computations.
This enables a greater number of cross-frame attentions over more frames within the same computational budget.
- Score: 13.595032265551184
- License:
- Abstract: Video-to-video synthesis poses significant challenges in maintaining character consistency, smooth temporal transitions, and preserving visual quality during fast motion. While recent fully cross-frame self-attention mechanisms have improved character consistency across multiple frames, they come with high computational costs and often include redundant operations, especially for videos with higher frame rates. To address these inefficiencies, we propose an adaptive motion-guided cross-frame attention mechanism that selectively reduces redundant computations. This enables a greater number of cross-frame attentions over more frames within the same computational budget, thereby enhancing both video quality and temporal coherence. Our method leverages optical flow to focus on moving regions while sparsely attending to stationary areas, allowing for the joint editing of more frames without increasing computational demands. Traditional frame interpolation techniques struggle with motion blur and flickering in intermediate frames, which compromises visual fidelity. To mitigate this, we introduce KV-caching for jointly edited frames, reusing keys and values across intermediate frames to preserve visual quality and maintain temporal consistency throughout the video. With our adaptive cross-frame self-attention approach, we achieve a threefold increase in the number of keyframes processed compared to existing methods, all within the same computational budget as fully cross-frame attention baselines. This results in significant improvements in prediction accuracy and temporal consistency, outperforming state-of-the-art approaches. Code will be made publicly available at https://github.com/tanvir-utexas/AdaVE/tree/main
Related papers
- VidToMe: Video Token Merging for Zero-Shot Video Editing [100.79999871424931]
We propose a novel approach to enhance temporal consistency in generated videos by merging self-attention tokens across frames.
Our method improves temporal coherence and reduces memory consumption in self-attention computations.
arXiv Detail & Related papers (2023-12-17T09:05:56Z) - RIGID: Recurrent GAN Inversion and Editing of Real Face Videos [73.97520691413006]
GAN inversion is indispensable for applying the powerful editability of GAN to real images.
Existing methods invert video frames individually often leading to undesired inconsistent results over time.
We propose a unified recurrent framework, named textbfRecurrent vtextbfIdeo textbfGAN textbfInversion and etextbfDiting (RIGID)
Our framework learns the inherent coherence between input frames in an end-to-end manner.
arXiv Detail & Related papers (2023-08-11T12:17:24Z) - Alignment-guided Temporal Attention for Video Action Recognition [18.5171795689609]
We show that frame-by-frame alignments have the potential to increase the mutual information between frame representations.
We propose Alignment-guided Temporal Attention (ATA) to extend 1-dimensional temporal attention with parameter-free patch-level alignments between neighboring frames.
arXiv Detail & Related papers (2022-09-30T23:10:47Z) - Video Frame Interpolation without Temporal Priors [91.04877640089053]
Video frame aims to synthesize non-exist intermediate frames in a video sequence.
The temporal priors of videos, i.e. frames per second (FPS) and frame exposure time, may vary from different camera sensors.
We devise a novel optical flow refinement strategy for better synthesizing results.
arXiv Detail & Related papers (2021-12-02T12:13:56Z) - TimeLens: Event-based Video Frame Interpolation [54.28139783383213]
We introduce Time Lens, a novel indicates equal contribution method that leverages the advantages of both synthesis-based and flow-based approaches.
We show an up to 5.21 dB improvement in terms of PSNR over state-of-the-art frame-based and event-based methods.
arXiv Detail & Related papers (2021-06-14T10:33:47Z) - ALANET: Adaptive Latent Attention Network forJoint Video Deblurring and
Interpolation [38.52446103418748]
We introduce a novel architecture, Adaptive Latent Attention Network (ALANET), which synthesizes sharp high frame-rate videos.
We employ combination of self-attention and cross-attention module between consecutive frames in the latent space to generate optimized representation for each frame.
Our method performs favorably against various state-of-the-art approaches, even though we tackle a much more difficult problem.
arXiv Detail & Related papers (2020-08-31T21:11:53Z) - All at Once: Temporally Adaptive Multi-Frame Interpolation with Advanced
Motion Modeling [52.425236515695914]
State-of-the-art methods are iterative solutions interpolating one frame at the time.
This work introduces a true multi-frame interpolator.
It utilizes a pyramidal style network in the temporal domain to complete the multi-frame task in one-shot.
arXiv Detail & Related papers (2020-07-23T02:34:39Z) - Efficient Semantic Video Segmentation with Per-frame Inference [117.97423110566963]
In this work, we process efficient semantic video segmentation in a per-frame fashion during the inference process.
We employ compact models for real-time execution. To narrow the performance gap between compact models and large models, new knowledge distillation methods are designed.
arXiv Detail & Related papers (2020-02-26T12:24:32Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.