Contrastive Sequential-Diffusion Learning: Non-linear and Multi-Scene Instructional Video Synthesis
- URL: http://arxiv.org/abs/2407.11814v2
- Date: Tue, 29 Oct 2024 12:42:39 GMT
- Title: Contrastive Sequential-Diffusion Learning: Non-linear and Multi-Scene Instructional Video Synthesis
- Authors: Vasco Ramos, Yonatan Bitton, Michal Yarom, Idan Szpektor, Joao Magalhaes,
- Abstract summary: We propose a contrastive sequential video diffusion method that selects the most suitable previously generated scene to guide and condition the denoising process of the next scene.
Experiments with real-world action-centric data demonstrate the practicality and improved consistency of our model compared to prior work.
- Score: 9.687215124767063
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Generated video scenes for action-centric sequence descriptions like recipe instructions and do-it-yourself projects include non-linear patterns, in which the next video may require to be visually consistent not on the immediate previous video but on earlier ones. Current multi-scene video synthesis approaches fail to meet these consistency requirements. To address this, we propose a contrastive sequential video diffusion method that selects the most suitable previously generated scene to guide and condition the denoising process of the next scene. The result is a multi-scene video that is grounded in the scene descriptions and coherent w.r.t the scenes that require visual consistency. Experiments with real-world action-centric data demonstrate the practicality and improved consistency of our model compared to prior work.
Related papers
- Optical-Flow Guided Prompt Optimization for Coherent Video Generation [51.430833518070145]
We propose a framework called MotionPrompt that guides the video generation process via optical flow.
We optimize learnable token embeddings during reverse sampling steps by using gradients from a trained discriminator applied to random frame pairs.
This approach allows our method to generate visually coherent video sequences that closely reflect natural motion dynamics, without compromising the fidelity of the generated content.
arXiv Detail & Related papers (2024-11-23T12:26:52Z) - Diffusion Priors for Dynamic View Synthesis from Monocular Videos [59.42406064983643]
Dynamic novel view synthesis aims to capture the temporal evolution of visual content within videos.
We first finetune a pretrained RGB-D diffusion model on the video frames using a customization technique.
We distill the knowledge from the finetuned model to a 4D representations encompassing both dynamic and static Neural Radiance Fields.
arXiv Detail & Related papers (2024-01-10T23:26:41Z) - Highly Detailed and Temporal Consistent Video Stylization via
Synchronized Multi-Frame Diffusion [22.33952368534147]
Text-guided video-to-video stylization transforms the visual appearance of a source video to a different appearance guided on textual prompts.
Existing text-guided image diffusion models can be extended for stylized video synthesis.
We propose a synchronized multi-frame diffusion framework to maintain both the visual details and the temporal consistency.
arXiv Detail & Related papers (2023-11-24T08:38:19Z) - SEINE: Short-to-Long Video Diffusion Model for Generative Transition and
Prediction [93.26613503521664]
This paper presents a short-to-long video diffusion model, SEINE, that focuses on generative transition and prediction.
We propose a random-mask video diffusion model to automatically generate transitions based on textual descriptions.
Our model generates transition videos that ensure coherence and visual quality.
arXiv Detail & Related papers (2023-10-31T17:58:17Z) - WAIT: Feature Warping for Animation to Illustration video Translation
using GANs [12.681919619814419]
We introduce a new problem for video stylizing where an unordered set of images are used.
Most of the video-to-video translation methods are built on an image-to-image translation model.
We propose a new generator network with feature warping layers which overcomes the limitations of the previous methods.
arXiv Detail & Related papers (2023-10-07T19:45:24Z) - InstructVid2Vid: Controllable Video Editing with Natural Language Instructions [97.17047888215284]
InstructVid2Vid is an end-to-end diffusion-based methodology for video editing guided by human language instructions.
Our approach empowers video manipulation guided by natural language directives, eliminating the need for per-example fine-tuning or inversion.
arXiv Detail & Related papers (2023-05-21T03:28:13Z) - Scene Consistency Representation Learning for Video Scene Segmentation [26.790491577584366]
We propose an effective Self-Supervised Learning (SSL) framework to learn better shot representations from long-term videos.
We present an SSL scheme to achieve scene consistency, while exploring considerable data augmentation and shuffling methods to boost the model generalizability.
Our method achieves the state-of-the-art performance on the task of Video Scene.
arXiv Detail & Related papers (2022-05-11T13:31:15Z) - Blind Video Temporal Consistency via Deep Video Prior [61.062900556483164]
We present a novel and general approach for blind video temporal consistency.
Our method is only trained on a pair of original and processed videos directly.
We show that temporal consistency can be achieved by training a convolutional network on a video with the Deep Video Prior.
arXiv Detail & Related papers (2020-10-22T16:19:20Z) - Future Video Synthesis with Object Motion Prediction [54.31508711871764]
Instead of synthesizing images directly, our approach is designed to understand the complex scene dynamics.
The appearance of the scene components in the future is predicted by non-rigid deformation of the background and affine transformation of moving objects.
Experimental results on the Cityscapes and KITTI datasets show that our model outperforms the state-of-the-art in terms of visual quality and accuracy.
arXiv Detail & Related papers (2020-04-01T16:09:54Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.