Related papers: MagicProp: Diffusion-based Video Editing via Motion-aware Appearance Propagation

MagicProp: Diffusion-based Video Editing via Motion-aware Appearance Propagation

URL: http://arxiv.org/abs/2309.00908v1
Date: Sat, 2 Sep 2023 11:13:29 GMT
Title: MagicProp: Diffusion-based Video Editing via Motion-aware Appearance Propagation
Authors: Hanshu Yan, Jun Hao Liew, Long Mai, Shanchuan Lin, Jiashi Feng
Abstract summary: MagicProp disentangles the video editing process into two stages: appearance editing and motion-aware appearance propagation. In the first stage, MagicProp selects a single frame from the input video and applies image-editing techniques to modify the content and/or style of the frame. In the second stage, MagicProp employs the edited frame as an appearance reference and generates the remaining frames using an autoregressive rendering approach.
Score: 74.32046206403177
License: http://creativecommons.org/licenses/by/4.0/
Abstract: This paper addresses the issue of modifying the visual appearance of videos while preserving their motion. A novel framework, named MagicProp, is proposed, which disentangles the video editing process into two stages: appearance editing and motion-aware appearance propagation. In the first stage, MagicProp selects a single frame from the input video and applies image-editing techniques to modify the content and/or style of the frame. The flexibility of these techniques enables the editing of arbitrary regions within the frame. In the second stage, MagicProp employs the edited frame as an appearance reference and generates the remaining frames using an autoregressive rendering approach. To achieve this, a diffusion-based conditional generation model, called PropDPM, is developed, which synthesizes the target frame by conditioning on the reference appearance, the target motion, and its previous appearance. The autoregressive editing approach ensures temporal consistency in the resulting videos. Overall, MagicProp combines the flexibility of image-editing techniques with the superior temporal consistency of autoregressive modeling, enabling flexible editing of object types and aesthetic styles in arbitrary regions of input videos while maintaining good temporal consistency across frames. Extensive experiments in various video editing scenarios demonstrate the effectiveness of MagicProp.

Related papers

LoRA-Edit: Controllable First-Frame-Guided Video Editing via Mask-Aware LoRA Fine-Tuning [8.077442711429317]
Video editing using diffusion models has achieved remarkable results in generating high-quality edits for videos.<n>First-frame-guided editing provides control over the first frame, but lacks flexibility over subsequent frames.<n>We propose a mask-based LoRA tuning method that adapts pretrained Image-to-Video (I2V) models for flexible video editing.
arXiv Detail & Related papers (2025-06-11T18:03:55Z)
Edit as You See: Image-guided Video Editing via Masked Motion Modeling [18.89936405508778]
We propose a novel Image-guided Video Editing Diffusion model, termed IVEDiff. IVEDiff is built on top of image editing models, and is equipped with learnable motion modules to maintain the temporal consistency of edited video. Our method is able to generate temporally smooth edited videos while robustly dealing with various editing objects with high quality.
arXiv Detail & Related papers (2025-01-08T07:52:12Z)
I2VEdit: First-Frame-Guided Video Editing via Image-to-Video Diffusion Models [18.36472998650704]
We introduce a novel and generic solution that extends the applicability of image editing tools to videos by propagating edits from a single frame to the entire video using a pre-trained image-to-video model. Our method, dubbed I2VEdit, adaptively preserves the visual and motion integrity of the source video depending on the extent of the edits.
arXiv Detail & Related papers (2024-05-26T11:47:40Z)
UniEdit: A Unified Tuning-Free Framework for Video Motion and Appearance Editing [28.140945021777878]
We present UniEdit, a tuning-free framework that supports both video motion and appearance editing. To realize motion editing while preserving source video content, we introduce auxiliary motion-reference and reconstruction branches. The obtained features are then injected into the main editing path via temporal and spatial self-attention layers.
arXiv Detail & Related papers (2024-02-20T17:52:12Z)
MagicStick: Controllable Video Editing via Control Handle Transformations [49.29608051543133]
MagicStick is a controllable video editing method that edits the video properties by utilizing the transformation on the extracted internal control signals. We present experiments on numerous examples within our unified framework. We also compare with shape-aware text-based editing and handcrafted motion video generation, demonstrating our superior temporal consistency and editing capability than previous works.
arXiv Detail & Related papers (2023-12-05T17:58:06Z)
MotionEditor: Editing Video Motion via Content-Aware Diffusion [96.825431998349]
MotionEditor is a diffusion model for video motion editing. It incorporates a novel content-aware motion adapter into ControlNet to capture temporal motion correspondence.
arXiv Detail & Related papers (2023-11-30T18:59:33Z)
FLATTEN: optical FLow-guided ATTENtion for consistent text-to-video editing [65.60744699017202]
We introduce optical flow into the attention module in the diffusion model's U-Net to address the inconsistency issue for text-to-video editing. Our method, FLATTEN, enforces the patches on the same flow path across different frames to attend to each other in the attention module. Results on existing text-to-video editing benchmarks show that our proposed method achieves the new state-of-the-art performance.
arXiv Detail & Related papers (2023-10-09T17:59:53Z)
MagicEdit: High-Fidelity and Temporally Coherent Video Editing [70.55750617502696]
We present MagicEdit, a surprisingly simple yet effective solution to the text-guided video editing task. We found that high-fidelity and temporally coherent video-to-video translation can be achieved by explicitly disentangling the learning of content, structure and motion signals during training.
arXiv Detail & Related papers (2023-08-28T17:56:22Z)
StableVideo: Text-driven Consistency-aware Diffusion Video Editing [24.50933856309234]
Diffusion-based methods can generate realistic images and videos, but they struggle to edit existing objects in a video while preserving their appearance over time. This paper introduces temporal dependency to existing text-driven diffusion models, which allows them to generate consistent appearance for the edited objects. We build up a text-driven video editing framework based on this mechanism, namely StableVideo, which can achieve consistency-aware video editing.
arXiv Detail & Related papers (2023-08-18T14:39:16Z)
Edit-A-Video: Single Video Editing with Object-Aware Consistency [49.43316939996227]
We propose a video editing framework given only a pretrained TTI model and a single text, video> pair, which we term Edit-A-Video. The framework consists of two stages: (1) inflating the 2D model into the 3D model by appending temporal modules tuning and on the source video (2) inverting the source video into the noise and editing with target text prompt and attention map injection. We present extensive experimental results over various types of text and videos, and demonstrate the superiority of the proposed method compared to baselines in terms of background consistency, text alignment, and video editing quality.
arXiv Detail & Related papers (2023-03-14T14:35:59Z)
Shape-aware Text-driven Layered Video Editing [39.56765973770167]
We present a shape-aware, text-driven video editing method to handle shape changes. We first propagate the deformation field between the input and edited to all frames. We then leverage a pre-trained text-conditioned diffusion model as guidance for refining shape distortion and completing unseen regions.
arXiv Detail & Related papers (2023-01-30T18:41:58Z)

This list is automatically generated from the titles and abstracts of the papers in this site.