Video Inpainting Localization with Contrastive Learning
- URL: http://arxiv.org/abs/2406.17628v1
- Date: Tue, 25 Jun 2024 15:15:54 GMT
- Title: Video Inpainting Localization with Contrastive Learning
- Authors: Zijie Lou, Gang Cao, Man Lin,
- Abstract summary: Deep inpainting is typically used as malicious manipulation to remove important objects for creating fake videos.
This letter proposes a simple yet effective scheme for Video Inpainting with ContrAstive Learning (ViLocal)
- Score: 2.1210527985139227
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Deep video inpainting is typically used as malicious manipulation to remove important objects for creating fake videos. It is significant to identify the inpainted regions blindly. This letter proposes a simple yet effective forensic scheme for Video Inpainting LOcalization with ContrAstive Learning (ViLocal). Specifically, a 3D Uniformer encoder is applied to the video noise residual for learning effective spatiotemporal forensic features. To enhance the discriminative power, supervised contrastive learning is adopted to capture the local inconsistency of inpainted videos through attracting/repelling the positive/negative pristine and forged pixel pairs. A pixel-wise inpainting localization map is yielded by a lightweight convolution decoder with a specialized two-stage training strategy. To prepare enough training samples, we build a video object segmentation dataset of 2500 videos with pixel-level annotations per frame. Extensive experimental results validate the superiority of ViLocal over state-of-the-arts. Code and dataset will be available at https://github.com/multimediaFor/ViLocal.
Related papers
- Trusted Video Inpainting Localization via Deep Attentive Noise Learning [2.1210527985139227]
We present a Trusted Video Inpainting localization network (TruVIL) with excellent robustness and generalization ability.
We design deep attentive noise learning in multiple stages to capture the inpainted traces.
To prepare enough training samples, we also build a frame-level video object segmentation dataset of 2500 videos.
arXiv Detail & Related papers (2024-06-19T14:08:58Z) - Rethinking CLIP-based Video Learners in Cross-Domain Open-Vocabulary Action Recognition [84.31749632725929]
In this paper, we focus on one critical challenge of the task, namely scene bias, and accordingly contribute a novel scene-aware video-text alignment method.
Our key idea is to distinguish video representations apart from scene-encoded text representations, aiming to learn scene-agnostic video representations for recognizing actions across domains.
arXiv Detail & Related papers (2024-03-03T16:48:16Z) - Learning Transferable Spatiotemporal Representations from Natural Script
Knowledge [65.40899722211726]
We introduce a new pretext task, Turning to Video Transcript for ASR (TVTS), which sorts scripts by attending to learned video representations.
The advantages enable our model to contextualize what is happening like human beings and seamlessly apply to large-scale uncurated video data in the real world.
arXiv Detail & Related papers (2022-09-30T07:39:48Z) - MILES: Visual BERT Pre-training with Injected Language Semantics for
Video-text Retrieval [43.2299969152561]
Methods for text-to-video retrieval on four datasets with both zero-shot and fine-tune evaluation protocols.
Our method outperforms state-of-the-art methods for text-to-video retrieval on four datasets with both zero-shot and fine-tune evaluation protocols.
arXiv Detail & Related papers (2022-04-26T16:06:31Z) - Locality-Aware Inter-and Intra-Video Reconstruction for Self-Supervised
Correspondence Learning [74.03651142051656]
We develop LIIR, a locality-aware inter-and intra-video reconstruction framework.
We exploit cross video affinities as extra negative samples within a unified, inter-and intra-video reconstruction scheme.
arXiv Detail & Related papers (2022-03-27T15:46:42Z) - Attention-guided Temporal Coherent Video Object Matting [78.82835351423383]
We propose a novel deep learning-based object matting method that can achieve temporally coherent matting results.
Its key component is an attention-based temporal aggregation module that maximizes image matting networks' strength.
We show how to effectively solve the trimap generation problem by fine-tuning a state-of-the-art video object segmentation network.
arXiv Detail & Related papers (2021-05-24T17:34:57Z) - Deep Video Inpainting Detection [95.36819088529622]
Video inpainting detection localizes an inpainted region in a video both spatially and temporally.
VIDNet, Video Inpainting Detection Network, contains a two-stream encoder-decoder architecture with attention module.
arXiv Detail & Related papers (2021-01-26T20:53:49Z) - Self-supervised Video Representation Learning by Uncovering
Spatio-temporal Statistics [74.6968179473212]
This paper proposes a novel pretext task to address the self-supervised learning problem.
We compute a series of partitioning-temporal statistical summaries, such as the spatial location and dominant direction of the largest motion.
A neural network is built and trained to yield the statistical summaries given the video frames as inputs.
arXiv Detail & Related papers (2020-08-31T08:31:56Z) - DVI: Depth Guided Video Inpainting for Autonomous Driving [35.94330601020169]
We present an automatic video inpainting algorithm that can remove traffic agents from videos.
By building a dense 3D map from stitched point clouds, frames within a video are geometrically correlated.
We are the first to fuse multiple videos for video inpainting.
arXiv Detail & Related papers (2020-07-17T09:29:53Z) - Visual Descriptor Learning from Monocular Video [25.082587246288995]
We propose a novel way to estimate dense correspondence on an RGB image by training a fully convolutional network.
Our method learns from RGB videos using contrastive loss, where relative labeling is estimated from optical flow.
Not only does the method perform well on test data with the same background, it also generalizes to situations with a new background.
arXiv Detail & Related papers (2020-04-15T11:19:57Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.