Mumpy: Multilateral Temporal-view Pyramid Transformer for Video Inpainting Detection
- URL: http://arxiv.org/abs/2404.11054v3
- Date: Thu, 29 Aug 2024 14:43:25 GMT
- Title: Mumpy: Multilateral Temporal-view Pyramid Transformer for Video Inpainting Detection
- Authors: Ying Zhang, Yuezun Li, Bo Peng, Jiaran Zhou, Huiyu Zhou, Junyu Dong,
- Abstract summary: We introduce a novel Multilateral Temporal-view Pyramid Transformer (em MumPy) that collaborates spatial-temporal clues flexibly.
Our method utilizes a newly designed multilateral temporal-view to extract various collaborations of spatial-temporal clues and introduces a deformable window-based temporal-view interaction module.
By adjusting the contribution strength of spatial and temporal clues, our method can effectively identify inpainted regions.
- Score: 41.4800103693756
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: The task of video inpainting detection is to expose the pixel-level inpainted regions within a video sequence. Existing methods usually focus on leveraging spatial and temporal inconsistencies. However, these methods typically employ fixed operations to combine spatial and temporal clues, limiting their applicability in different scenarios. In this paper, we introduce a novel Multilateral Temporal-view Pyramid Transformer ({\em MumPy}) that collaborates spatial-temporal clues flexibly. Our method utilizes a newly designed multilateral temporal-view encoder to extract various collaborations of spatial-temporal clues and introduces a deformable window-based temporal-view interaction module to enhance the diversity of these collaborations. Subsequently, we develop a multi-pyramid decoder to aggregate the various types of features and generate detection maps. By adjusting the contribution strength of spatial and temporal clues, our method can effectively identify inpainted regions. We validate our method on existing datasets and also introduce a new challenging and large-scale Video Inpainting dataset based on the YouTube-VOS dataset, which employs several more recent inpainting methods. The results demonstrate the superiority of our method in both in-domain and cross-domain evaluation scenarios.
Related papers
- UMMAFormer: A Universal Multimodal-adaptive Transformer Framework for
Temporal Forgery Localization [16.963092523737593]
We propose a novel framework for temporal forgery localization (TFL) that predicts forgery segments with multimodal adaptation.
Our approach achieves state-of-the-art performance on benchmark datasets, including Lav-DF, TVIL, and Psynd.
arXiv Detail & Related papers (2023-08-28T08:20:30Z) - Adaptive Human Matting for Dynamic Videos [62.026375402656754]
Adaptive Matting for Dynamic Videos, termed AdaM, is a framework for simultaneously differentiating foregrounds from backgrounds.
Two interconnected network designs are employed to achieve this goal.
We benchmark and study our methods recently introduced datasets, showing that our matting achieves new best-in-class generalizability.
arXiv Detail & Related papers (2023-04-12T17:55:59Z) - Implicit View-Time Interpolation of Stereo Videos using Multi-Plane
Disparities and Non-Uniform Coordinates [10.445563506186307]
We build upon X-Fields that approximates an interpolatable mapping between the input coordinates and 2D RGB images.
We propose multi-plane disparities to reduce the spatial distance of the objects in the stereo views.
We additionally introduce several simple, but important, improvements over X-Fields.
arXiv Detail & Related papers (2023-03-30T06:32:55Z) - Generic Event Boundary Detection in Video with Pyramid Features [12.896848011230523]
Generic event boundary detection (GEBD) aims to split video into chunks at a broad and diverse set of actions as humans naturally perceive event boundaries.
We present an approach that considers the correlation between neighbor frames with pyramid feature maps in both spatial and temporal dimensions.
arXiv Detail & Related papers (2023-01-11T03:29:27Z) - Detection of Deepfake Videos Using Long Distance Attention [73.6659488380372]
Most existing detection methods treat the problem as a vanilla binary classification problem.
In this paper, the problem is treated as a special fine-grained classification problem since the differences between fake and real faces are very subtle.
A spatial-temporal model is proposed which has two components for capturing spatial and temporal forgery traces in global perspective.
arXiv Detail & Related papers (2021-06-24T08:33:32Z) - DeepMultiCap: Performance Capture of Multiple Characters Using Sparse
Multiview Cameras [63.186486240525554]
DeepMultiCap is a novel method for multi-person performance capture using sparse multi-view cameras.
Our method can capture time varying surface details without the need of using pre-scanned template models.
arXiv Detail & Related papers (2021-05-01T14:32:13Z) - Deep Video Matting via Spatio-Temporal Alignment and Aggregation [63.6870051909004]
We propose a deep learning-based video matting framework which employs a novel aggregation feature module (STFAM)
To eliminate frame-by-frame trimap annotations, a lightweight interactive trimap propagation network is also introduced.
Our framework significantly outperforms conventional video matting and deep image matting methods.
arXiv Detail & Related papers (2021-04-22T17:42:08Z) - Coarse-Fine Networks for Temporal Activity Detection in Videos [45.03545172714305]
We introduce 'Co-Fine Networks', a two-stream architecture which benefits from different abstractions of temporal resolution to learn better video representations for long-term motion.
We show that our method can outperform the state-of-the-arts for action detection in public datasets with a significantly reduced compute and memory footprint.
arXiv Detail & Related papers (2021-03-01T20:48:01Z) - Deep Video Inpainting Detection [95.36819088529622]
Video inpainting detection localizes an inpainted region in a video both spatially and temporally.
VIDNet, Video Inpainting Detection Network, contains a two-stream encoder-decoder architecture with attention module.
arXiv Detail & Related papers (2021-01-26T20:53:49Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.