StableV2V: Stablizing Shape Consistency in Video-to-Video Editing
- URL: http://arxiv.org/abs/2411.11045v1
- Date: Sun, 17 Nov 2024 11:48:01 GMT
- Title: StableV2V: Stablizing Shape Consistency in Video-to-Video Editing
- Authors: Chang Liu, Rui Li, Kaidong Zhang, Yunwei Lan, Dong Liu,
- Abstract summary: We present a shape-consistent video editing method, namely StableV2V, in this paper.
Our method decomposes the entire editing pipeline into several sequential procedures, where it edits the first video frame, then establishes an alignment between the delivered motions and user prompts, and eventually propagates the edited contents to all other frames based on such alignment.
Experimental results and analyses illustrate the outperforming performance, visual consistency, and inference efficiency of our method compared to existing state-of-the-art studies.
- Score: 11.09708780767668
- License:
- Abstract: Recent advancements of generative AI have significantly promoted content creation and editing, where prevailing studies further extend this exciting progress to video editing. In doing so, these studies mainly transfer the inherent motion patterns from the source videos to the edited ones, where results with inferior consistency to user prompts are often observed, due to the lack of particular alignments between the delivered motions and edited contents. To address this limitation, we present a shape-consistent video editing method, namely StableV2V, in this paper. Our method decomposes the entire editing pipeline into several sequential procedures, where it edits the first video frame, then establishes an alignment between the delivered motions and user prompts, and eventually propagates the edited contents to all other frames based on such alignment. Furthermore, we curate a testing benchmark, namely DAVIS-Edit, for a comprehensive evaluation of video editing, considering various types of prompts and difficulties. Experimental results and analyses illustrate the outperforming performance, visual consistency, and inference efficiency of our method compared to existing state-of-the-art studies.
Related papers
- VIA: Unified Spatiotemporal Video Adaptation Framework for Global and Local Video Editing [91.60658973688996]
We introduce VIA unified Video Adaptation framework for global and local video editing.
We show that VIA can achieve consistent long video editing in minutes, unlocking the potential for advanced video editing tasks.
arXiv Detail & Related papers (2024-06-18T17:51:37Z) - Zero-Shot Video Editing through Adaptive Sliding Score Distillation [51.57440923362033]
This study proposes a novel paradigm of video-based score distillation, facilitating direct manipulation of original video content.
We propose an Adaptive Sliding Score Distillation strategy, which incorporates both global and local video guidance to reduce the impact of editing errors.
arXiv Detail & Related papers (2024-06-07T12:33:59Z) - I2VEdit: First-Frame-Guided Video Editing via Image-to-Video Diffusion Models [18.36472998650704]
We introduce a novel and generic solution that extends the applicability of image editing tools to videos by propagating edits from a single frame to the entire video using a pre-trained image-to-video model.
Our method, dubbed I2VEdit, adaptively preserves the visual and motion integrity of the source video depending on the extent of the edits.
arXiv Detail & Related papers (2024-05-26T11:47:40Z) - ReVideo: Remake a Video with Motion and Content Control [67.5923127902463]
We present a novel attempt to Remake a Video (VideoRe) which allows precise video editing in specific areas through the specification of both content and motion.
VideoRe addresses a new task involving the coupling and training imbalance between content and motion control.
Our method can also seamlessly extend these applications to multi-area editing without modifying specific training, demonstrating its flexibility and robustness.
arXiv Detail & Related papers (2024-05-22T17:46:08Z) - AnyV2V: A Tuning-Free Framework For Any Video-to-Video Editing Tasks [41.640692114423544]
We introduce AnyV2V, a novel tuning-free paradigm designed to simplify video editing.
AnyV2V can leverage any existing image editing tools to support an extensive array of video editing tasks.
Our evaluation shows that AnyV2V achieved CLIP-scores comparable to other baseline methods.
arXiv Detail & Related papers (2024-03-21T15:15:00Z) - FastVideoEdit: Leveraging Consistency Models for Efficient Text-to-Video Editing [8.907836546058086]
Existing approaches relying on image generation models for video editing suffer from time-consuming one-shot fine-tuning, additional condition extraction, or DDIM inversion.
We propose FastVideoEdit, an efficient zero-shot video editing approach inspired by Consistency Models (CMs)
Our method enables direct mapping from source video to target video with strong preservation ability utilizing a special variance schedule.
arXiv Detail & Related papers (2024-03-10T17:12:01Z) - FLATTEN: optical FLow-guided ATTENtion for consistent text-to-video
editing [65.60744699017202]
We introduce optical flow into the attention module in the diffusion model's U-Net to address the inconsistency issue for text-to-video editing.
Our method, FLATTEN, enforces the patches on the same flow path across different frames to attend to each other in the attention module.
Results on existing text-to-video editing benchmarks show that our proposed method achieves the new state-of-the-art performance.
arXiv Detail & Related papers (2023-10-09T17:59:53Z) - Ground-A-Video: Zero-shot Grounded Video Editing using Text-to-image
Diffusion Models [65.268245109828]
Ground-A-Video is a video-to-video translation framework for multi-attribute video editing.
It attains temporally consistent editing of input videos in a training-free manner.
Experiments and applications demonstrate that Ground-A-Video's zero-shot capacity outperforms other baseline methods in terms of edit-accuracy and frame consistency.
arXiv Detail & Related papers (2023-10-02T11:28:37Z) - InstructVid2Vid: Controllable Video Editing with Natural Language Instructions [97.17047888215284]
InstructVid2Vid is an end-to-end diffusion-based methodology for video editing guided by human language instructions.
Our approach empowers video manipulation guided by natural language directives, eliminating the need for per-example fine-tuning or inversion.
arXiv Detail & Related papers (2023-05-21T03:28:13Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.