Dreamix: Video Diffusion Models are General Video Editors
- URL: http://arxiv.org/abs/2302.01329v1
- Date: Thu, 2 Feb 2023 18:58:58 GMT
- Title: Dreamix: Video Diffusion Models are General Video Editors
- Authors: Eyal Molad, Eliahu Horwitz, Dani Valevski, Alex Rav Acha, Yossi
Matias, Yael Pritch, Yaniv Leviathan, Yedid Hoshen
- Abstract summary: Text-driven image and video diffusion models have recently achieved unprecedented generation realism.
We present the first diffusion-based method that is able to perform text-based motion and appearance editing of general videos.
- Score: 22.127604561922897
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Text-driven image and video diffusion models have recently achieved
unprecedented generation realism. While diffusion models have been successfully
applied for image editing, very few works have done so for video editing. We
present the first diffusion-based method that is able to perform text-based
motion and appearance editing of general videos. Our approach uses a video
diffusion model to combine, at inference time, the low-resolution
spatio-temporal information from the original video with new, high resolution
information that it synthesized to align with the guiding text prompt. As
obtaining high-fidelity to the original video requires retaining some of its
high-resolution information, we add a preliminary stage of finetuning the model
on the original video, significantly boosting fidelity. We propose to improve
motion editability by a new, mixed objective that jointly finetunes with full
temporal attention and with temporal attention masking. We further introduce a
new framework for image animation. We first transform the image into a coarse
video by simple image processing operations such as replication and perspective
geometric projections, and then use our general video editor to animate it. As
a further application, we can use our method for subject-driven video
generation. Extensive qualitative and numerical experiments showcase the
remarkable editing ability of our method and establish its superior performance
compared to baseline methods.
Related papers
- Temporally Consistent Object Editing in Videos using Extended Attention [9.605596668263173]
We propose a method to edit videos using a pre-trained inpainting image diffusion model.
We ensure that the edited information will be consistent across all the video frames.
arXiv Detail & Related papers (2024-06-01T02:31:16Z) - I2VEdit: First-Frame-Guided Video Editing via Image-to-Video Diffusion Models [18.36472998650704]
We introduce a novel and generic solution that extends the applicability of image editing tools to videos by propagating edits from a single frame to the entire video using a pre-trained image-to-video model.
Our method, dubbed I2VEdit, adaptively preserves the visual and motion integrity of the source video depending on the extent of the edits.
arXiv Detail & Related papers (2024-05-26T11:47:40Z) - DreamMotion: Space-Time Self-Similar Score Distillation for Zero-Shot Video Editing [48.238213651343784]
Video score distillation can introduce new content indicated by target text, but can also cause structure and motion deviation.
We propose to match space-time self-similarities of the original video and the edited video during the score distillation.
Our approach is model-agnostic, which can be applied for both cascaded and non-cascaded video diffusion frameworks.
arXiv Detail & Related papers (2024-03-18T17:38:53Z) - Lumiere: A Space-Time Diffusion Model for Video Generation [75.54967294846686]
We introduce a Space-Time U-Net architecture that generates the entire temporal duration of the video at once.
This is in contrast to existing video models which synthesize distants followed by temporal super-resolution.
By deploying both spatial and (importantly) temporal down- and up-sampling, our model learns to directly generate a full-frame-rate, low-resolution video.
arXiv Detail & Related papers (2024-01-23T18:05:25Z) - Zero-Shot Video Editing Using Off-The-Shelf Image Diffusion Models [68.31777975873742]
Recent attempts at video editing require significant text-to-video data and computation resources for training.
We propose vid2vid-zero, a simple yet effective method for zero-shot video editing.
Experiments and analyses show promising results in editing attributes, subjects, places, etc., in real-world videos.
arXiv Detail & Related papers (2023-03-30T17:59:25Z) - Pix2Video: Video Editing using Image Diffusion [43.07444438561277]
We investigate how to use pre-trained image models for text-guided video editing.
Our method works in two simple steps: first, we use a pre-trained structure-guided (e.g., depth) image diffusion model to perform text-guided edits on an anchor frame.
We demonstrate that realistic text-guided video edits are possible, without any compute-intensive preprocessing or video-specific finetuning.
arXiv Detail & Related papers (2023-03-22T16:36:10Z) - FateZero: Fusing Attentions for Zero-shot Text-based Video Editing [104.27329655124299]
We propose FateZero, a zero-shot text-based editing method on real-world videos without per-prompt training or use-specific mask.
Our method is the first one to show the ability of zero-shot text-driven video style and local attribute editing from the trained text-to-image model.
arXiv Detail & Related papers (2023-03-16T17:51:13Z) - Edit-A-Video: Single Video Editing with Object-Aware Consistency [49.43316939996227]
We propose a video editing framework given only a pretrained TTI model and a single text, video> pair, which we term Edit-A-Video.
The framework consists of two stages: (1) inflating the 2D model into the 3D model by appending temporal modules tuning and on the source video (2) inverting the source video into the noise and editing with target text prompt and attention map injection.
We present extensive experimental results over various types of text and videos, and demonstrate the superiority of the proposed method compared to baselines in terms of background consistency, text alignment, and video editing quality.
arXiv Detail & Related papers (2023-03-14T14:35:59Z) - Video-P2P: Video Editing with Cross-attention Control [68.64804243427756]
Video-P2P is a novel framework for real-world video editing with cross-attention control.
Video-P2P works well on real-world videos for generating new characters while optimally preserving their original poses and scenes.
arXiv Detail & Related papers (2023-03-08T17:53:49Z) - Diffusion Video Autoencoders: Toward Temporally Consistent Face Video
Editing via Disentangled Video Encoding [35.18070525015657]
We propose a novel face video editing framework based on diffusion autoencoders.
Our model is based on diffusion models and can satisfy both reconstruction and edit capabilities at the same time.
arXiv Detail & Related papers (2022-12-06T07:41:51Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.