EffiVED:Efficient Video Editing via Text-instruction Diffusion Models
- URL: http://arxiv.org/abs/2403.11568v2
- Date: Wed, 5 Jun 2024 07:05:15 GMT
- Title: EffiVED:Efficient Video Editing via Text-instruction Diffusion Models
- Authors: Zhenghao Zhang, Zuozhuo Dai, Long Qin, Weizhi Wang,
- Abstract summary: We introduce EffiVED, an efficient diffusion-based model that supports instruction-guided video editing.
We transform vast image editing datasets and open-world videos into a high-quality dataset for training EffiVED.
- Score: 9.287394166165424
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Large-scale text-to-video models have shown remarkable abilities, but their direct application in video editing remains challenging due to limited available datasets. Current video editing methods commonly require per-video fine-tuning of diffusion models or specific inversion optimization to ensure high-fidelity edits. In this paper, we introduce EffiVED, an efficient diffusion-based model that directly supports instruction-guided video editing. To achieve this, we present two efficient workflows to gather video editing pairs, utilizing augmentation and fundamental vision-language techniques. These workflows transform vast image editing datasets and open-world videos into a high-quality dataset for training EffiVED. Experimental results reveal that EffiVED not only generates high-quality editing videos but also executes rapidly. Finally, we demonstrate that our data collection method significantly improves editing performance and can potentially tackle the scarcity of video editing data. Code can be found at https://github.com/alibaba/EffiVED.
Related papers
- COVE: Unleashing the Diffusion Feature Correspondence for Consistent Video Editing [57.76170824395532]
Video editing is an emerging task, in which most current methods adopt the pre-trained text-to-image (T2I) diffusion model to edit the source video.
We propose COrrespondence-guided Video Editing (COVE) to achieve high-quality and consistent video editing.
COVE can be seamlessly integrated into the pre-trained T2I diffusion model without the need for extra training or optimization.
arXiv Detail & Related papers (2024-06-13T06:27:13Z) - I2VEdit: First-Frame-Guided Video Editing via Image-to-Video Diffusion Models [18.36472998650704]
We introduce a novel and generic solution that extends the applicability of image editing tools to videos by propagating edits from a single frame to the entire video using a pre-trained image-to-video model.
Our method, dubbed I2VEdit, adaptively preserves the visual and motion integrity of the source video depending on the extent of the edits.
arXiv Detail & Related papers (2024-05-26T11:47:40Z) - FastVideoEdit: Leveraging Consistency Models for Efficient Text-to-Video
Editing [10.011515580084243]
Existing approaches relying on image generation models for video editing suffer from time-consuming one-shot fine-tuning, additional condition extraction, or DDIM inversion.
We propose FastVideoEdit, an efficient zero-shot video editing approach inspired by Consistency Models (CMs)
Our method enables direct mapping from source video to target video with strong preservation ability utilizing a special variance schedule.
arXiv Detail & Related papers (2024-03-10T17:12:01Z) - Neural Video Fields Editing [56.558490998753456]
NVEdit is a text-driven video editing framework designed to mitigate memory overhead and improve consistency.
We construct a neural video field, powered by tri-plane and sparse grid, to enable encoding long videos with hundreds of frames.
Next, we update the video field through off-the-shelf Text-to-Image (T2I) models to text-driven editing effects.
arXiv Detail & Related papers (2023-12-12T14:48:48Z) - VIDiff: Translating Videos via Multi-Modal Instructions with Diffusion
Models [96.55004961251889]
Video Instruction Diffusion (VIDiff) is a unified foundation model designed for a wide range of video tasks.
Our model can edit and translate the desired results within seconds based on user instructions.
We provide convincing generative results for diverse input videos and written instructions, both qualitatively and quantitatively.
arXiv Detail & Related papers (2023-11-30T18:59:52Z) - Ground-A-Video: Zero-shot Grounded Video Editing using Text-to-image
Diffusion Models [65.268245109828]
Ground-A-Video is a video-to-video translation framework for multi-attribute video editing.
It attains temporally consistent editing of input videos in a training-free manner.
Experiments and applications demonstrate that Ground-A-Video's zero-shot capacity outperforms other baseline methods in terms of edit-accuracy and frame consistency.
arXiv Detail & Related papers (2023-10-02T11:28:37Z) - INVE: Interactive Neural Video Editing [79.48055669064229]
Interactive Neural Video Editing (INVE) is a real-time video editing solution that consistently propagates sparse frame edits to the entire video clip.
Our method is inspired by the recent work on Layered Neural Atlas (LNA)
LNA suffers from two major drawbacks: (1) the method is too slow for interactive editing, and (2) it offers insufficient support for some editing use cases.
arXiv Detail & Related papers (2023-07-15T00:02:41Z) - Zero-Shot Video Editing Using Off-The-Shelf Image Diffusion Models [68.31777975873742]
Recent attempts at video editing require significant text-to-video data and computation resources for training.
We propose vid2vid-zero, a simple yet effective method for zero-shot video editing.
Experiments and analyses show promising results in editing attributes, subjects, places, etc., in real-world videos.
arXiv Detail & Related papers (2023-03-30T17:59:25Z) - Dreamix: Video Diffusion Models are General Video Editors [22.127604561922897]
Text-driven image and video diffusion models have recently achieved unprecedented generation realism.
We present the first diffusion-based method that is able to perform text-based motion and appearance editing of general videos.
arXiv Detail & Related papers (2023-02-02T18:58:58Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.