BVINet: Unlocking Blind Video Inpainting with Zero Annotations
- URL: http://arxiv.org/abs/2502.01181v1
- Date: Mon, 03 Feb 2025 09:17:24 GMT
- Title: BVINet: Unlocking Blind Video Inpainting with Zero Annotations
- Authors: Zhiliang Wu, Kerui Chen, Kun Li, Hehe Fan, Yi Yang,
- Abstract summary: We propose an end-to-end blind video inpainting network (BVINet) to address both "where to inpaint" and "how to inpaint" simultaneously.
BVINet can predict the masks of corrupted regions by detecting semantic-discontinuous regions of the frame and utilizing temporal consistency prior to the video.
We customize a dataset consisting of synthetic corrupted videos, real-world corrupted videos, and their corresponding completed videos.
- Score: 31.363309191994066
- License:
- Abstract: Video inpainting aims to fill in corrupted regions of the video with plausible contents. Existing methods generally assume that the locations of corrupted regions are known, focusing primarily on the "how to inpaint". This reliance necessitates manual annotation of the corrupted regions using binary masks to indicate "whereto inpaint". However, the annotation of these masks is labor-intensive and expensive, limiting the practicality of current methods. In this paper, we expect to relax this assumption by defining a new blind video inpainting setting, enabling the networks to learn the mapping from corrupted video to inpainted result directly, eliminating the need of corrupted region annotations. Specifically, we propose an end-to-end blind video inpainting network (BVINet) to address both "where to inpaint" and "how to inpaint" simultaneously. On the one hand, BVINet can predict the masks of corrupted regions by detecting semantic-discontinuous regions of the frame and utilizing temporal consistency prior of the video. On the other hand, the predicted masks are incorporated into the BVINet, allowing it to capture valid context information from uncorrupted regions to fill in corrupted ones. Besides, we introduce a consistency loss to regularize the training parameters of BVINet. In this way, mask prediction and video completion mutually constrain each other, thereby maximizing the overall performance of the trained model. Furthermore, we customize a dataset consisting of synthetic corrupted videos, real-world corrupted videos, and their corresponding completed videos. This dataset serves as a valuable resource for advancing blind video inpainting research. Extensive experimental results demonstrate the effectiveness and superiority of our method.
Related papers
- Video Inpainting Localization with Contrastive Learning [2.1210527985139227]
Deep inpainting is typically used as malicious manipulation to remove important objects for creating fake videos.
This letter proposes a simple yet effective scheme for Video Inpainting with ContrAstive Learning (ViLocal)
arXiv Detail & Related papers (2024-06-25T15:15:54Z) - Raformer: Redundancy-Aware Transformer for Video Wire Inpainting [77.41727407673066]
Video Wire Inpainting (VWI) is a prominent application in video inpainting, aimed at flawlessly removing wires in films or TV series.
Wire removal poses greater challenges due to the wires being longer and slimmer than objects typically targeted in general video inpainting tasks.
We introduce a new VWI dataset with a novel mask generation strategy, namely Wire Removal Video dataset 2 (WRV2) and Pseudo Wire-Shaped (PWS) Masks.
WRV2 dataset comprises over 4,000 videos with an average length of 80 frames, designed to facilitate the development and efficacy of inpainting models.
arXiv Detail & Related papers (2024-04-24T11:02:13Z) - One-Shot Video Inpainting [5.7120338754738835]
We propose a unified pipeline for one-shot video inpainting (OSVI)
By jointly learning mask prediction and video completion in an end-to-end manner, the results can be optimal for the entire task.
Our method is more reliable because the predicted masks can be used as the network's internal guidance.
arXiv Detail & Related papers (2023-02-28T07:30:36Z) - Semi-Supervised Video Inpainting with Cycle Consistency Constraints [13.414206652584236]
We propose an end-to-end trainable framework consisting of completion network and mask prediction network.
We generate corrupted contents of the current frame using the known mask and decide the regions to be filled of the next frame, respectively.
Our model is trained in a semi-supervised manner, but it can achieve comparable performance as fully-supervised methods.
arXiv Detail & Related papers (2022-08-14T08:46:37Z) - Flow-Guided Video Inpainting with Scene Templates [57.12499174362993]
We consider the problem of filling in missing-temporal regions of a video.
We introduce a generative model of images in relation to the scene (without missing regions) and mappings from the scene to images.
We use the model to jointly infer the scene template, a 2D representation of the scene, and the mappings.
arXiv Detail & Related papers (2021-08-29T13:49:13Z) - Occlusion-Aware Video Object Inpainting [72.38919601150175]
This paper presents occlusion-aware video object inpainting, which recovers both the complete shape and appearance for occluded objects in videos.
Our technical contribution VOIN jointly performs video object shape completion and occluded texture generation.
For more realistic results, VOIN is optimized using both T-PatchGAN and a newoc-temporal YouTube attention-based multi-class discriminator.
arXiv Detail & Related papers (2021-08-15T15:46:57Z) - Deep Video Inpainting Detection [95.36819088529622]
Video inpainting detection localizes an inpainted region in a video both spatially and temporally.
VIDNet, Video Inpainting Detection Network, contains a two-stream encoder-decoder architecture with attention module.
arXiv Detail & Related papers (2021-01-26T20:53:49Z) - Learning Joint Spatial-Temporal Transformations for Video Inpainting [58.939131620135235]
We propose to learn a joint Spatial-Temporal Transformer Network (STTN) for video inpainting.
We simultaneously fill missing regions in all input frames by self-attention, and propose to optimize STTN by a spatial-temporal adversarial loss.
arXiv Detail & Related papers (2020-07-20T16:35:48Z) - DVI: Depth Guided Video Inpainting for Autonomous Driving [35.94330601020169]
We present an automatic video inpainting algorithm that can remove traffic agents from videos.
By building a dense 3D map from stitched point clouds, frames within a video are geometrically correlated.
We are the first to fuse multiple videos for video inpainting.
arXiv Detail & Related papers (2020-07-17T09:29:53Z) - VCNet: A Robust Approach to Blind Image Inpainting [70.68227719731243]
Blind inpainting is a task to automatically complete visual contents without specifying masks for missing areas in an image.
In this paper, we define a new blind inpainting setting, making training a blind inpainting neural system robust against unknown missing region patterns.
Our method is effective and robust in blind image inpainting. And our VCN allows for a wide spectrum of applications.
arXiv Detail & Related papers (2020-03-15T12:47:57Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.