Training-Free Semantic Video Composition via Pre-trained Diffusion Model
- URL: http://arxiv.org/abs/2401.09195v1
- Date: Wed, 17 Jan 2024 13:07:22 GMT
- Title: Training-Free Semantic Video Composition via Pre-trained Diffusion Model
- Authors: Jiaqi Guo, Sitong Su, Junchen Zhu, Lianli Gao, Jingkuan Song
- Abstract summary: Current approaches, predominantly trained on videos with adjusted foreground color and lighting, struggle to address deep semantic disparities beyond superficial adjustments.
We propose a training-free pipeline employing a pre-trained diffusion model imbued with semantic prior knowledge.
Experimental results reveal that our pipeline successfully ensures the visual harmony and inter-frame coherence of the outputs.
- Score: 96.0168609879295
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: The video composition task aims to integrate specified foregrounds and
backgrounds from different videos into a harmonious composite. Current
approaches, predominantly trained on videos with adjusted foreground color and
lighting, struggle to address deep semantic disparities beyond superficial
adjustments, such as domain gaps. Therefore, we propose a training-free
pipeline employing a pre-trained diffusion model imbued with semantic prior
knowledge, which can process composite videos with broader semantic
disparities. Specifically, we process the video frames in a cascading manner
and handle each frame in two processes with the diffusion model. In the
inversion process, we propose Balanced Partial Inversion to obtain generation
initial points that balance reversibility and modifiability. Then, in the
generation process, we further propose Inter-Frame Augmented attention to
augment foreground continuity across frames. Experimental results reveal that
our pipeline successfully ensures the visual harmony and inter-frame coherence
of the outputs, demonstrating efficacy in managing broader semantic
disparities.
Related papers
- TVG: A Training-free Transition Video Generation Method with Diffusion Models [12.037716102326993]
Transition videos play a crucial role in media production, enhancing the flow and coherence of visual narratives.
Recent advances in diffusion model-based video generation offer new possibilities for creating transitions but face challenges such as poor inter-frame relationship modeling and abrupt content changes.
We propose a novel training-free Transition Video Generation (TVG) approach using video-level diffusion models that addresses these limitations without additional training.
arXiv Detail & Related papers (2024-08-24T00:33:14Z) - FRESCO: Spatial-Temporal Correspondence for Zero-Shot Video Translation [85.29772293776395]
We introduce FRESCO, intra-frame correspondence alongside inter-frame correspondence to establish a more robust spatial-temporal constraint.
This enhancement ensures a more consistent transformation of semantically similar content across frames.
Our approach involves an explicit update of features to achieve high spatial-temporal consistency with the input video.
arXiv Detail & Related papers (2024-03-19T17:59:18Z) - ViewFusion: Towards Multi-View Consistency via Interpolated Denoising [48.02829400913904]
We introduce ViewFusion, a training-free algorithm that can be seamlessly integrated into existing pre-trained diffusion models.
Our approach adopts an auto-regressive method that implicitly leverages previously generated views as context for the next view generation.
Our framework successfully extends single-view conditioned models to work in multiple-view conditional settings without any additional fine-tuning.
arXiv Detail & Related papers (2024-02-29T04:21:38Z) - Highly Detailed and Temporal Consistent Video Stylization via
Synchronized Multi-Frame Diffusion [22.33952368534147]
Text-guided video-to-video stylization transforms the visual appearance of a source video to a different appearance guided on textual prompts.
Existing text-guided image diffusion models can be extended for stylized video synthesis.
We propose a synchronized multi-frame diffusion framework to maintain both the visual details and the temporal consistency.
arXiv Detail & Related papers (2023-11-24T08:38:19Z) - InstructVid2Vid: Controllable Video Editing with Natural Language Instructions [97.17047888215284]
InstructVid2Vid is an end-to-end diffusion-based methodology for video editing guided by human language instructions.
Our approach empowers video manipulation guided by natural language directives, eliminating the need for per-example fine-tuning or inversion.
arXiv Detail & Related papers (2023-05-21T03:28:13Z) - VideoFusion: Decomposed Diffusion Models for High-Quality Video
Generation [88.49030739715701]
This work presents a decomposed diffusion process via resolving the per-frame noise into a base noise that is shared among all frames and a residual noise that varies along the time axis.
Experiments on various datasets confirm that our approach, termed as VideoFusion, surpasses both GAN-based and diffusion-based alternatives in high-quality video generation.
arXiv Detail & Related papers (2023-03-15T02:16:39Z) - Deep Video Prior for Video Consistency and Propagation [58.250209011891904]
We present a novel and general approach for blind video temporal consistency.
Our method is only trained on a pair of original and processed videos directly instead of a large dataset.
We show that temporal consistency can be achieved by training a convolutional neural network on a video with Deep Video Prior.
arXiv Detail & Related papers (2022-01-27T16:38:52Z) - Contrastive Transformation for Self-supervised Correspondence Learning [120.62547360463923]
We study the self-supervised learning of visual correspondence using unlabeled videos in the wild.
Our method simultaneously considers intra- and inter-video representation associations for reliable correspondence estimation.
Our framework outperforms the recent self-supervised correspondence methods on a range of visual tasks.
arXiv Detail & Related papers (2020-12-09T14:05:06Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.