Related papers: Be-Your-Outpainter: Mastering Video Outpainting through Input-Specific Adaptation

Be-Your-Outpainter: Mastering Video Outpainting through Input-Specific Adaptation

URL: http://arxiv.org/abs/2403.13745v1
Date: Wed, 20 Mar 2024 16:53:45 GMT
Title: Be-Your-Outpainter: Mastering Video Outpainting through Input-Specific Adaptation
Authors: Fu-Yun Wang, Xiaoshi Wu, Zhaoyang Huang, Xiaoyu Shi, Dazhong Shen, Guanglu Song, Yu Liu, Hongsheng Li,
Abstract summary: Video outpainting is a challenging task, aiming at generating video content outside the viewport of the input video. We introduce MOTIA Mastering Video Outpainting Through Input-Specific Adaptation. MoTIA comprises two main phases: input-specific adaptation and pattern-aware outpainting.
Score: 44.92712228326116
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Video outpainting is a challenging task, aiming at generating video content outside the viewport of the input video while maintaining inter-frame and intra-frame consistency. Existing methods fall short in either generation quality or flexibility. We introduce MOTIA Mastering Video Outpainting Through Input-Specific Adaptation, a diffusion-based pipeline that leverages both the intrinsic data-specific patterns of the source video and the image/video generative prior for effective outpainting. MOTIA comprises two main phases: input-specific adaptation and pattern-aware outpainting. The input-specific adaptation phase involves conducting efficient and effective pseudo outpainting learning on the single-shot source video. This process encourages the model to identify and learn patterns within the source video, as well as bridging the gap between standard generative processes and outpainting. The subsequent phase, pattern-aware outpainting, is dedicated to the generalization of these learned patterns to generate outpainting outcomes. Additional strategies including spatial-aware insertion and noise travel are proposed to better leverage the diffusion model's generative prior and the acquired video patterns from source videos. Extensive evaluations underscore MOTIA's superiority, outperforming existing state-of-the-art methods in widely recognized benchmarks. Notably, these advancements are achieved without necessitating extensive, task-specific tuning.

Related papers

Subject-driven Video Generation via Disentangled Identity and Motion [52.54835936914813]
We propose to train a subject-driven customized video generation model through decoupling the subject-specific learning from temporal dynamics in zero-shot without additional tuning. Our method achieves strong subject consistency and scalability, outperforming existing video customization models in zero-shot settings.
arXiv Detail & Related papers (2025-04-23T06:48:31Z)
VideoPainter: Any-length Video Inpainting and Editing with Plug-and-Play Context Control [47.34885131252508]
Video inpainting aims to restore corrupted video content. We propose a novel dual-stream paradigm VideoPainter to process masked videos. We also introduce a novel target region ID resampling technique that enables any-length video inpainting.
arXiv Detail & Related papers (2025-03-07T17:59:46Z)
VipDiff: Towards Coherent and Diverse Video Inpainting via Training-free Denoising Diffusion Models [21.584843961386888]
VipDiff is a framework for conditioning diffusion model on the reverse diffusion process to produce temporal-coherent inpainting results. It can largely outperform state-of-the-art video inpainting methods in terms of both spatial-temporal coherence and fidelity.
arXiv Detail & Related papers (2025-01-21T16:39:09Z)
DiffuEraser: A Diffusion Model for Video Inpainting [13.292164408616257]
We introduce DiffuEraser, a video inpainting model based on stable diffusion, to fill masked regions with greater details and more coherent structures. We also expand the temporal receptive fields of both the prior model and DiffuEraser, and further enhance consistency by leveraging the temporal smoothing property of Video Diffusion Models.
arXiv Detail & Related papers (2025-01-17T08:03:02Z)
Optical-Flow Guided Prompt Optimization for Coherent Video Generation [51.430833518070145]
We propose a framework called MotionPrompt that guides the video generation process via optical flow. We optimize learnable token embeddings during reverse sampling steps by using gradients from a trained discriminator applied to random frame pairs. This approach allows our method to generate visually coherent video sequences that closely reflect natural motion dynamics, without compromising the fidelity of the generated content.
arXiv Detail & Related papers (2024-11-23T12:26:52Z)
LVCD: Reference-based Lineart Video Colorization with Diffusion Models [18.0983825973013]
We propose the first video diffusion framework for reference-based lineart video colorization. We leverage a large-scale pretrained video diffusion model to generate colorized animation videos. Our method is capable of generating high-quality, long temporal-consistent animation videos.
arXiv Detail & Related papers (2024-09-19T17:59:48Z)
Video Diffusion Models are Strong Video Inpainter [14.402778136825642]
We propose a novel First Frame Filling Video Diffusion Inpainting model (FFF-VDI) We propagate the noise latent information of future frames to fill the masked areas of the first frame's noise latent code. Next, we fine-tune the pre-trained image-to-video diffusion model to generate the inpainted video.
arXiv Detail & Related papers (2024-08-21T08:01:00Z)
Semantically Consistent Video Inpainting with Conditional Diffusion Models [16.42354856518832]
We present a framework for solving problems with conditional video diffusion models. We introduce inpainting-specific sampling schemes which capture crucial long-range dependencies in the context. We devise a novel method for conditioning on the known pixels in incomplete frames.
arXiv Detail & Related papers (2024-04-30T23:49:26Z)
Raformer: Redundancy-Aware Transformer for Video Wire Inpainting [77.41727407673066]
Video Wire Inpainting (VWI) is a prominent application in video inpainting, aimed at flawlessly removing wires in films or TV series. Wire removal poses greater challenges due to the wires being longer and slimmer than objects typically targeted in general video inpainting tasks. We introduce a new VWI dataset with a novel mask generation strategy, namely Wire Removal Video dataset 2 (WRV2) and Pseudo Wire-Shaped (PWS) Masks. WRV2 dataset comprises over 4,000 videos with an average length of 80 frames, designed to facilitate the development and efficacy of inpainting models.
arXiv Detail & Related papers (2024-04-24T11:02:13Z)
DreamVideo: High-Fidelity Image-to-Video Generation with Image Retention and Text Guidance [69.0740091741732]
We propose a high-fidelity image-to-video generation method by devising a frame retention branch based on a pre-trained video diffusion model, named DreamVideo. Our model has a powerful image retention ability and delivers the best results in UCF101 compared to other image-to-video models to our best knowledge.
arXiv Detail & Related papers (2023-12-05T03:16:31Z)
Flow-Guided Diffusion for Video Inpainting [15.478104117672803]
Video inpainting has been challenged by complex scenarios like large movements and low-light conditions. Current methods, including emerging diffusion models, face limitations in quality and efficiency. This paper introduces the Flow-Guided Diffusion model for Video Inpainting (FGDVI), a novel approach that significantly enhances temporal consistency and inpainting quality.
arXiv Detail & Related papers (2023-11-26T17:48:48Z)
MoVideo: Motion-Aware Video Generation with Diffusion Models [97.03352319694795]
We propose a novel motion-aware generation (MoVideo) framework that takes motion into consideration from two aspects: video depth and optical flow. MoVideo achieves state-of-the-art results in both text-to-video and image-to-video generation, showing promising prompt consistency, frame consistency and visual quality.
arXiv Detail & Related papers (2023-11-19T13:36:03Z)
Style-A-Video: Agile Diffusion for Arbitrary Text-based Video Style Transfer [13.098901971644656]
This paper proposes a zero-shot video stylization method named Style-A-Video. Uses a generative pre-trained transformer with an image latent diffusion model to achieve a concise text-controlled video stylization. Tests show that we can attain superior content preservation and stylistic performance while incurring less consumption than previous solutions.
arXiv Detail & Related papers (2023-05-09T14:03:27Z)
Deep Video Prior for Video Consistency and Propagation [58.250209011891904]
We present a novel and general approach for blind video temporal consistency. Our method is only trained on a pair of original and processed videos directly instead of a large dataset. We show that temporal consistency can be achieved by training a convolutional neural network on a video with Deep Video Prior.
arXiv Detail & Related papers (2022-01-27T16:38:52Z)

This list is automatically generated from the titles and abstracts of the papers in this site.