Raformer: Redundancy-Aware Transformer for Video Wire Inpainting
- URL: http://arxiv.org/abs/2404.15802v1
- Date: Wed, 24 Apr 2024 11:02:13 GMT
- Title: Raformer: Redundancy-Aware Transformer for Video Wire Inpainting
- Authors: Zhong Ji, Yimu Su, Yan Zhang, Jiacheng Hou, Yanwei Pang, Jungong Han,
- Abstract summary: Video Wire Inpainting (VWI) is a prominent application in video inpainting, aimed at flawlessly removing wires in films or TV series.
Wire removal poses greater challenges due to the wires being longer and slimmer than objects typically targeted in general video inpainting tasks.
We introduce a new VWI dataset with a novel mask generation strategy, namely Wire Removal Video dataset 2 (WRV2) and Pseudo Wire-Shaped (PWS) Masks.
WRV2 dataset comprises over 4,000 videos with an average length of 80 frames, designed to facilitate the development and efficacy of inpainting models.
- Score: 77.41727407673066
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Video Wire Inpainting (VWI) is a prominent application in video inpainting, aimed at flawlessly removing wires in films or TV series, offering significant time and labor savings compared to manual frame-by-frame removal. However, wire removal poses greater challenges due to the wires being longer and slimmer than objects typically targeted in general video inpainting tasks, and often intersecting with people and background objects irregularly, which adds complexity to the inpainting process. Recognizing the limitations posed by existing video wire datasets, which are characterized by their small size, poor quality, and limited variety of scenes, we introduce a new VWI dataset with a novel mask generation strategy, namely Wire Removal Video Dataset 2 (WRV2) and Pseudo Wire-Shaped (PWS) Masks. WRV2 dataset comprises over 4,000 videos with an average length of 80 frames, designed to facilitate the development and efficacy of inpainting models. Building upon this, our research proposes the Redundancy-Aware Transformer (Raformer) method that addresses the unique challenges of wire removal in video inpainting. Unlike conventional approaches that indiscriminately process all frame patches, Raformer employs a novel strategy to selectively bypass redundant parts, such as static background segments devoid of valuable information for inpainting. At the core of Raformer is the Redundancy-Aware Attention (RAA) module, which isolates and accentuates essential content through a coarse-grained, window-based attention mechanism. This is complemented by a Soft Feature Alignment (SFA) module, which refines these features and achieves end-to-end feature alignment. Extensive experiments on both the traditional video inpainting datasets and our proposed WRV2 dataset demonstrate that Raformer outperforms other state-of-the-art methods.
Related papers
- Video Diffusion Models are Strong Video Inpainter [14.402778136825642]
We propose a novel First Frame Filling Video Diffusion Inpainting model (FFF-VDI)
We propagate the noise latent information of future frames to fill the masked areas of the first frame's noise latent code.
Next, we fine-tune the pre-trained image-to-video diffusion model to generate the inpainted video.
arXiv Detail & Related papers (2024-08-21T08:01:00Z) - Learning Inclusion Matching for Animation Paint Bucket Colorization [76.4507878427755]
We introduce a new learning-based inclusion matching pipeline, which directs the network to comprehend the inclusion relationships between segments.
Our method features a two-stage pipeline that integrates a coarse color warping module with an inclusion matching module.
To facilitate the training of our network, we also develope a unique dataset, referred to as PaintBucket-Character.
arXiv Detail & Related papers (2024-03-27T08:32:48Z) - Towards Online Real-Time Memory-based Video Inpainting Transformers [95.90235034520167]
Inpainting tasks have seen significant improvements in recent years with the rise of deep neural networks and, in particular, vision transformers.
We propose a framework to adapt existing inpainting transformers to constraints by memorizing and refining redundant computations.
Using this framework with some of the most recent inpainting models, we show great online results with a consistent throughput above 20 frames per second.
arXiv Detail & Related papers (2024-03-24T14:02:25Z) - Be-Your-Outpainter: Mastering Video Outpainting through Input-Specific Adaptation [44.92712228326116]
Video outpainting is a challenging task, aiming at generating video content outside the viewport of the input video.
We introduce MOTIA Mastering Video Outpainting Through Input-Specific Adaptation.
MoTIA comprises two main phases: input-specific adaptation and pattern-aware outpainting.
arXiv Detail & Related papers (2024-03-20T16:53:45Z) - AVID: Any-Length Video Inpainting with Diffusion Model [30.860927136236374]
We introduce Any-Length Video Inpainting with Diffusion Model, dubbed as AVID.
Our model is equipped with effective motion modules and adjustable structure guidance, for fixed-length video inpainting.
Our experiments show our model can robustly deal with various inpainting types at different video duration ranges, with high quality.
arXiv Detail & Related papers (2023-12-06T18:56:14Z) - Spatial-Temporal Frequency Forgery Clue for Video Forgery Detection in
VIS and NIR Scenario [87.72258480670627]
Existing face forgery detection methods based on frequency domain find that the GAN forged images have obvious grid-like visual artifacts in the frequency spectrum compared to the real images.
This paper proposes a Cosine Transform-based Forgery Clue Augmentation Network (FCAN-DCT) to achieve a more comprehensive spatial-temporal feature representation.
arXiv Detail & Related papers (2022-07-05T09:27:53Z) - VRT: A Video Restoration Transformer [126.79589717404863]
Video restoration (e.g., video super-resolution) aims to restore high-quality frames from low-quality frames.
We propose a Video Restoration Transformer (VRT) with parallel frame prediction and long-range temporal dependency modelling abilities.
arXiv Detail & Related papers (2022-01-28T17:54:43Z) - Occlusion-Aware Video Object Inpainting [72.38919601150175]
This paper presents occlusion-aware video object inpainting, which recovers both the complete shape and appearance for occluded objects in videos.
Our technical contribution VOIN jointly performs video object shape completion and occluded texture generation.
For more realistic results, VOIN is optimized using both T-PatchGAN and a newoc-temporal YouTube attention-based multi-class discriminator.
arXiv Detail & Related papers (2021-08-15T15:46:57Z) - Internal Video Inpainting by Implicit Long-range Propagation [39.89676105875726]
We propose a novel framework for video inpainting by adopting an internal learning strategy.
We show that this can be achieved implicitly by fitting a convolutional neural network to the known region.
We extend the proposed method to another challenging task: learning to remove an object from a video giving a single object mask in only one frame in a 4K video.
arXiv Detail & Related papers (2021-08-04T08:56:28Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.