VideoControlNet: A Motion-Guided Video-to-Video Translation Framework by
Using Diffusion Model with ControlNet
- URL: http://arxiv.org/abs/2307.14073v2
- Date: Thu, 3 Aug 2023 09:34:24 GMT
- Title: VideoControlNet: A Motion-Guided Video-to-Video Translation Framework by
Using Diffusion Model with ControlNet
- Authors: Zhihao Hu, Dong Xu
- Abstract summary: We propose a new motion-guided video-to-video translation framework called VideoControlNet.
Inspired by the video codecs that use motion information for reducing temporal redundancy, our framework uses motion information to prevent the regeneration of the redundant areas.
Our experiments demonstrate that our proposed VideoControlNet inherits the generation capability of the pre-trained large diffusion model.
- Score: 26.458417029197957
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Recently, diffusion models like StableDiffusion have achieved impressive
image generation results. However, the generation process of such diffusion
models is uncontrollable, which makes it hard to generate videos with
continuous and consistent content. In this work, by using the diffusion model
with ControlNet, we proposed a new motion-guided video-to-video translation
framework called VideoControlNet to generate various videos based on the given
prompts and the condition from the input video. Inspired by the video codecs
that use motion information for reducing temporal redundancy, our framework
uses motion information to prevent the regeneration of the redundant areas for
content consistency. Specifically, we generate the first frame (i.e., the
I-frame) by using the diffusion model with ControlNet. Then we generate other
key frames (i.e., the P-frame) based on the previous I/P-frame by using our
newly proposed motion-guided P-frame generation (MgPG) method, in which the
P-frames are generated based on the motion information and the occlusion areas
are inpainted by using the diffusion model. Finally, the rest frames (i.e., the
B-frame) are generated by using our motion-guided B-frame interpolation (MgBI)
module. Our experiments demonstrate that our proposed VideoControlNet inherits
the generation capability of the pre-trained large diffusion model and extends
the image diffusion model to the video diffusion model by using motion
information. More results are provided at our project page.
Related papers
- BlobGEN-Vid: Compositional Text-to-Video Generation with Blob Video Representations [82.94002870060045]
Existing video generation models struggle to follow complex text prompts and synthesize multiple objects.
We develop a blob-grounded video diffusion model named BlobGEN-Vid that allows users to control object motions and fine-grained object appearance.
We show that our framework is model-agnostic and build BlobGEN-Vid based on both U-Net and DiT-based video diffusion models.
arXiv Detail & Related papers (2025-01-13T19:17:06Z) - ZeroSmooth: Training-free Diffuser Adaptation for High Frame Rate Video Generation [81.90265212988844]
We propose a training-free video method for generative video models in a plug-and-play manner.
We transform a video model into a self-cascaded video diffusion model with the designed hidden state correction modules.
Our training-free method is even comparable to trained models supported by huge compute resources and large-scale datasets.
arXiv Detail & Related papers (2024-06-03T00:31:13Z) - Efficient Video Diffusion Models via Content-Frame Motion-Latent Decomposition [124.41196697408627]
We propose content-motion latent diffusion model (CMD), a novel efficient extension of pretrained image diffusion models for video generation.
CMD encodes a video as a combination of a content frame (like an image) and a low-dimensional motion latent representation.
We generate the content frame by fine-tuning a pretrained image diffusion model, and we generate the motion latent representation by training a new lightweight diffusion model.
arXiv Detail & Related papers (2024-03-21T05:48:48Z) - BIVDiff: A Training-Free Framework for General-Purpose Video Synthesis via Bridging Image and Video Diffusion Models [40.73982918337828]
We propose a training-free general-purpose video synthesis framework, coined as bf BIVDiff, via bridging specific image diffusion models and general text-to-video foundation diffusion models.
Specifically, we first use a specific image diffusion model (e.g., ControlNet and Instruct Pix2Pix) for frame-wise video generation, then perform Mixed Inversion on the generated video, and finally input the inverted latents into the video diffusion models.
arXiv Detail & Related papers (2023-12-05T14:56:55Z) - MoVideo: Motion-Aware Video Generation with Diffusion Models [97.03352319694795]
We propose a novel motion-aware generation (MoVideo) framework that takes motion into consideration from two aspects: video depth and optical flow.
MoVideo achieves state-of-the-art results in both text-to-video and image-to-video generation, showing promising prompt consistency, frame consistency and visual quality.
arXiv Detail & Related papers (2023-11-19T13:36:03Z) - LLM-grounded Video Diffusion Models [57.23066793349706]
Video diffusion models have emerged as a promising tool for neuraltemporal generation.
Current models struggle with prompts and often restricted or incorrect motion.
We introduce LLM-grounded Video Diffusion (LVD)
Our results demonstrate that LVD significantly outperforms its base video diffusion model.
arXiv Detail & Related papers (2023-09-29T17:54:46Z) - Control-A-Video: Controllable Text-to-Video Diffusion Models with Motion Prior and Reward Feedback Learning [50.60891619269651]
Control-A-Video is a controllable T2V diffusion model that can generate videos conditioned on text prompts and reference control maps like edge and depth maps.
We propose novel strategies to incorporate content prior and motion prior into the diffusion-based generation process.
Our framework generates higher-quality, more consistent videos compared to existing state-of-the-art methods in controllable text-to-video generation.
arXiv Detail & Related papers (2023-05-23T09:03:19Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.