UniPaint: Unified Space-time Video Inpainting via Mixture-of-Experts
- URL: http://arxiv.org/abs/2412.06340v1
- Date: Mon, 09 Dec 2024 09:45:14 GMT
- Title: UniPaint: Unified Space-time Video Inpainting via Mixture-of-Experts
- Authors: Zhen Wan, Yue Ma, Chenyang Qi, Zhiheng Liu, Tao Gui,
- Abstract summary: UniPaint is a generative space-time video inpainting framework that enables spatial-temporal inpainting.
UniPaint produces high-quality and aesthetically pleasing results, achieving the best results across various tasks and scale setups.
- Score: 20.955898491009656
- License:
- Abstract: In this paper, we present UniPaint, a unified generative space-time video inpainting framework that enables spatial-temporal inpainting and interpolation. Different from existing methods that treat video inpainting and video interpolation as two distinct tasks, we leverage a unified inpainting framework to tackle them and observe that these two tasks can mutually enhance synthesis performance. Specifically, we first introduce a plug-and-play space-time video inpainting adapter, which can be employed in various personalized models. The key insight is to propose a Mixture of Experts (MoE) attention to cover various tasks. Then, we design a spatial-temporal masking strategy during the training stage to mutually enhance each other and improve performance. UniPaint produces high-quality and aesthetically pleasing results, achieving the best quantitative results across various tasks and scale setups. The code and checkpoints will be available soon.
Related papers
- DreamColour: Controllable Video Colour Editing without Training [80.90808879991182]
We present a training-free framework that makes precise video colour editing accessible through an intuitive interface.
By decoupling spatial and temporal aspects of colour editing, we can better align with users' natural workflow.
Our approach matches or exceeds state-of-the-art methods while eliminating the need for training or specialized hardware.
arXiv Detail & Related papers (2024-12-06T16:57:54Z) - Mumpy: Multilateral Temporal-view Pyramid Transformer for Video Inpainting Detection [41.4800103693756]
We introduce a novel Multilateral Temporal-view Pyramid Transformer (em MumPy) that collaborates spatial-temporal clues flexibly.
Our method utilizes a newly designed multilateral temporal-view to extract various collaborations of spatial-temporal clues and introduces a deformable window-based temporal-view interaction module.
By adjusting the contribution strength of spatial and temporal clues, our method can effectively identify inpainted regions.
arXiv Detail & Related papers (2024-04-17T03:56:28Z) - Be-Your-Outpainter: Mastering Video Outpainting through Input-Specific Adaptation [44.92712228326116]
Video outpainting is a challenging task, aiming at generating video content outside the viewport of the input video.
We introduce MOTIA Mastering Video Outpainting Through Input-Specific Adaptation.
MoTIA comprises two main phases: input-specific adaptation and pattern-aware outpainting.
arXiv Detail & Related papers (2024-03-20T16:53:45Z) - Towards Language-Driven Video Inpainting via Multimodal Large Language Models [116.22805434658567]
We introduce a new task -- language-driven video inpainting.
It uses natural language instructions to guide the inpainting process.
We present the Remove Objects from Videos by Instructions dataset.
arXiv Detail & Related papers (2024-01-18T18:59:13Z) - AVID: Any-Length Video Inpainting with Diffusion Model [30.860927136236374]
We introduce Any-Length Video Inpainting with Diffusion Model, dubbed as AVID.
Our model is equipped with effective motion modules and adjustable structure guidance, for fixed-length video inpainting.
Our experiments show our model can robustly deal with various inpainting types at different video duration ranges, with high quality.
arXiv Detail & Related papers (2023-12-06T18:56:14Z) - Cylin-Painting: Seamless {360\textdegree} Panoramic Image Outpainting
and Beyond [136.18504104345453]
We present a Cylin-Painting framework that involves meaningful collaborations between inpainting and outpainting.
The proposed algorithm can be effectively extended to other panoramic vision tasks, such as object detection, depth estimation, and image super-resolution.
arXiv Detail & Related papers (2022-04-18T21:18:49Z) - StyleVideoGAN: A Temporal Generative Model using a Pretrained StyleGAN [70.31913835035206]
We present a novel approach to the video synthesis problem that helps to greatly improve visual quality.
We make use of a pre-trained StyleGAN network, the latent space of which allows control over the appearance of the objects it was trained for.
Our temporal architecture is then trained not on sequences of RGB frames, but on sequences of StyleGAN latent codes.
arXiv Detail & Related papers (2021-07-15T09:58:15Z) - Decoupled Spatial-Temporal Transformer for Video Inpainting [77.8621673355983]
Video aims to fill the given holes with realistic appearance but is still a challenging task even with prosperous deep learning approaches.
Recent works introduce the promising Transformer architecture into deep video inpainting and achieve better performance.
We propose a Decoupled Spatial-Temporal Transformer (DSTT) for improving video inpainting with exceptional efficiency.
arXiv Detail & Related papers (2021-04-14T05:47:46Z) - Learning Joint Spatial-Temporal Transformations for Video Inpainting [58.939131620135235]
We propose to learn a joint Spatial-Temporal Transformer Network (STTN) for video inpainting.
We simultaneously fill missing regions in all input frames by self-attention, and propose to optimize STTN by a spatial-temporal adversarial loss.
arXiv Detail & Related papers (2020-07-20T16:35:48Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.