Related papers: Semantically Consistent Video Inpainting with Conditional Diffusion Models

Semantically Consistent Video Inpainting with Conditional Diffusion Models

URL: http://arxiv.org/abs/2405.00251v1
Date: Tue, 30 Apr 2024 23:49:26 GMT
Title: Semantically Consistent Video Inpainting with Conditional Diffusion Models
Authors: Dylan Green, William Harvey, Saeid Naderiparizi, Matthew Niedoba, Yunpeng Liu, Xiaoxuan Liang, Jonathan Lavington, Ke Zhang, Vasileios Lioutas, Setareh Dabiri, Adam Scibior, Berend Zwartsenberg, Frank Wood,
Abstract summary: We present a framework for solving problems that require the synthesis of novel content that is not present in other frames. We show that our method is capable of generating diverse, high-quality inpaintings and synthesizing new content that is spatially, temporally, and semantically consistent with the provided context.
Score: 16.42354856518832
License: http://creativecommons.org/licenses/by-sa/4.0/
Abstract: Current state-of-the-art methods for video inpainting typically rely on optical flow or attention-based approaches to inpaint masked regions by propagating visual information across frames. While such approaches have led to significant progress on standard benchmarks, they struggle with tasks that require the synthesis of novel content that is not present in other frames. In this paper we reframe video inpainting as a conditional generative modeling problem and present a framework for solving such problems with conditional video diffusion models. We highlight the advantages of using a generative approach for this task, showing that our method is capable of generating diverse, high-quality inpaintings and synthesizing new content that is spatially, temporally, and semantically consistent with the provided context.

Related papers

OutDreamer: Video Outpainting with a Diffusion Transformer [37.512451098188635]
We introduce OutDreamer, a DiT-based video outpainting framework.<n>We propose a mask-driven self-attention layer that dynamically integrates the given mask information.<n>For long video outpainting, we employ a cross-video-clip refiner to iteratively generate missing content.
arXiv Detail & Related papers (2025-06-27T15:08:54Z)
DiffuEraser: A Diffusion Model for Video Inpainting [13.292164408616257]
We introduce DiffuEraser, a video inpainting model based on stable diffusion, to fill masked regions with greater details and more coherent structures. We also expand the temporal receptive fields of both the prior model and DiffuEraser, and further enhance consistency by leveraging the temporal smoothing property of Video Diffusion Models.
arXiv Detail & Related papers (2025-01-17T08:03:02Z)
Video Diffusion Models are Strong Video Inpainter [14.402778136825642]
We propose a novel First Frame Filling Video Diffusion Inpainting model (FFF-VDI) We propagate the noise latent information of future frames to fill the masked areas of the first frame's noise latent code. Next, we fine-tune the pre-trained image-to-video diffusion model to generate the inpainted video.
arXiv Detail & Related papers (2024-08-21T08:01:00Z)
RTQ: Rethinking Video-language Understanding Based on Image-text Model [55.278942477715084]
Video-language understanding presents unique challenges due to the inclusion of highly complex semantic details. We propose a novel framework called RTQ, which addresses these challenges simultaneously. Our model demonstrates outstanding performance even in the absence of video-language pre-training.
arXiv Detail & Related papers (2023-12-01T04:51:01Z)
Rerender A Video: Zero-Shot Text-Guided Video-to-Video Translation [93.18163456287164]
This paper proposes a novel text-guided video-to-video translation framework to adapt image models to videos. Our framework achieves global style and local texture temporal consistency at a low cost.
arXiv Detail & Related papers (2023-06-13T17:52:23Z)
Multi-object Video Generation from Single Frame Layouts [84.55806837855846]
We propose a video generative framework capable of synthesizing global scenes with local objects. Our framework is a non-trivial adaptation from image generation methods, and is new to this field. Our model has been evaluated on two widely-used video recognition benchmarks.
arXiv Detail & Related papers (2023-05-06T09:07:01Z)
Internal Video Inpainting by Implicit Long-range Propagation [39.89676105875726]
We propose a novel framework for video inpainting by adopting an internal learning strategy. We show that this can be achieved implicitly by fitting a convolutional neural network to the known region. We extend the proposed method to another challenging task: learning to remove an object from a video giving a single object mask in only one frame in a 4K video.
arXiv Detail & Related papers (2021-08-04T08:56:28Z)
TimeLens: Event-based Video Frame Interpolation [54.28139783383213]
We introduce Time Lens, a novel indicates equal contribution method that leverages the advantages of both synthesis-based and flow-based approaches. We show an up to 5.21 dB improvement in terms of PSNR over state-of-the-art frame-based and event-based methods.
arXiv Detail & Related papers (2021-06-14T10:33:47Z)
Short-Term and Long-Term Context Aggregation Network for Video Inpainting [126.06302824297948]
Video inpainting aims to restore missing regions of a video and has many applications such as video editing and object removal. We present a novel context aggregation network to effectively exploit both short-term and long-term frame information for video inpainting. Experiments show that it outperforms state-of-the-art methods with better inpainting results and fast inpainting speed.
arXiv Detail & Related papers (2020-09-12T03:50:56Z)
Learning Joint Spatial-Temporal Transformations for Video Inpainting [58.939131620135235]
We propose to learn a joint Spatial-Temporal Transformer Network (STTN) for video inpainting. We simultaneously fill missing regions in all input frames by self-attention, and propose to optimize STTN by a spatial-temporal adversarial loss.
arXiv Detail & Related papers (2020-07-20T16:35:48Z)

This list is automatically generated from the titles and abstracts of the papers in this site.