Deep Video Inpainting Detection
- URL: http://arxiv.org/abs/2101.11080v1
- Date: Tue, 26 Jan 2021 20:53:49 GMT
- Title: Deep Video Inpainting Detection
- Authors: Peng Zhou, Ning Yu, Zuxuan Wu, Larry S. Davis, Abhinav Shrivastava and
Ser-Nam Lim
- Abstract summary: Video inpainting detection localizes an inpainted region in a video both spatially and temporally.
VIDNet, Video Inpainting Detection Network, contains a two-stream encoder-decoder architecture with attention module.
- Score: 95.36819088529622
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: This paper studies video inpainting detection, which localizes an inpainted
region in a video both spatially and temporally. In particular, we introduce
VIDNet, Video Inpainting Detection Network, which contains a two-stream
encoder-decoder architecture with attention module. To reveal artifacts encoded
in compression, VIDNet additionally takes in Error Level Analysis frames to
augment RGB frames, producing multimodal features at different levels with an
encoder. Exploring spatial and temporal relationships, these features are
further decoded by a Convolutional LSTM to predict masks of inpainted regions.
In addition, when detecting whether a pixel is inpainted or not, we present a
quad-directional local attention module that borrows information from its
surrounding pixels from four directions. Extensive experiments are conducted to
validate our approach. We demonstrate, among other things, that VIDNet not only
outperforms by clear margins alternative inpainting detection methods but also
generalizes well on novel videos that are unseen during training.
Related papers
- Weakly Supervised Video Anomaly Detection and Localization with Spatio-Temporal Prompts [57.01985221057047]
This paper introduces a novel method that learnstemporal prompt embeddings for weakly supervised video anomaly detection and localization (WSVADL) based on pre-trained vision-language models (VLMs)
Our method achieves state-of-theart performance on three public benchmarks for the WSVADL task.
arXiv Detail & Related papers (2024-08-12T03:31:29Z) - Video Inpainting Localization with Contrastive Learning [2.1210527985139227]
Deep inpainting is typically used as malicious manipulation to remove important objects for creating fake videos.
This letter proposes a simple yet effective scheme for Video Inpainting with ContrAstive Learning (ViLocal)
arXiv Detail & Related papers (2024-06-25T15:15:54Z) - Trusted Video Inpainting Localization via Deep Attentive Noise Learning [2.1210527985139227]
We present a Trusted Video Inpainting localization network (TruVIL) with excellent robustness and generalization ability.
We design deep attentive noise learning in multiple stages to capture the inpainted traces.
To prepare enough training samples, we also build a frame-level video object segmentation dataset of 2500 videos.
arXiv Detail & Related papers (2024-06-19T14:08:58Z) - Mumpy: Multilateral Temporal-view Pyramid Transformer for Video Inpainting Detection [41.4800103693756]
We introduce a novel Multilateral Temporal-view Pyramid Transformer (em MumPy) that collaborates spatial-temporal clues flexibly.
Our method utilizes a newly designed multilateral temporal-view to extract various collaborations of spatial-temporal clues and introduces a deformable window-based temporal-view interaction module.
By adjusting the contribution strength of spatial and temporal clues, our method can effectively identify inpainted regions.
arXiv Detail & Related papers (2024-04-17T03:56:28Z) - A Spatial-Temporal Deformable Attention based Framework for Breast
Lesion Detection in Videos [107.96514633713034]
We propose a spatial-temporal deformable attention based framework, named STNet.
Our STNet introduces a spatial-temporal deformable attention module to perform local spatial-temporal feature fusion.
Experiments on the public breast lesion ultrasound video dataset show that our STNet obtains a state-of-the-art detection performance.
arXiv Detail & Related papers (2023-09-09T07:00:10Z) - Multimodal Graph Learning for Deepfake Detection [10.077496841634135]
Existing deepfake detectors face several challenges in achieving robustness and generalization.
We propose a novel framework, namely Multimodal Graph Learning (MGL), that leverages information from multiple modalities.
Our proposed method aims to effectively identify and utilize distinguishing features for deepfake detection.
arXiv Detail & Related papers (2022-09-12T17:17:49Z) - Spatial-Temporal Frequency Forgery Clue for Video Forgery Detection in
VIS and NIR Scenario [87.72258480670627]
Existing face forgery detection methods based on frequency domain find that the GAN forged images have obvious grid-like visual artifacts in the frequency spectrum compared to the real images.
This paper proposes a Cosine Transform-based Forgery Clue Augmentation Network (FCAN-DCT) to achieve a more comprehensive spatial-temporal feature representation.
arXiv Detail & Related papers (2022-07-05T09:27:53Z) - Self-supervised Video Representation Learning by Uncovering
Spatio-temporal Statistics [74.6968179473212]
This paper proposes a novel pretext task to address the self-supervised learning problem.
We compute a series of partitioning-temporal statistical summaries, such as the spatial location and dominant direction of the largest motion.
A neural network is built and trained to yield the statistical summaries given the video frames as inputs.
arXiv Detail & Related papers (2020-08-31T08:31:56Z) - Learning Joint Spatial-Temporal Transformations for Video Inpainting [58.939131620135235]
We propose to learn a joint Spatial-Temporal Transformer Network (STTN) for video inpainting.
We simultaneously fill missing regions in all input frames by self-attention, and propose to optimize STTN by a spatial-temporal adversarial loss.
arXiv Detail & Related papers (2020-07-20T16:35:48Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.