Frame-rate Up-conversion Detection Based on Convolutional Neural Network
for Learning Spatiotemporal Features
- URL: http://arxiv.org/abs/2103.13674v1
- Date: Thu, 25 Mar 2021 08:47:46 GMT
- Title: Frame-rate Up-conversion Detection Based on Convolutional Neural Network
for Learning Spatiotemporal Features
- Authors: Minseok Yoon, Seung-Hun Nam, In-Jae Yu, Wonhyuk Ahn, Myung-Joon Kwon,
Heung-Kyu Lee
- Abstract summary: This paper proposes a frame-rate conversion detection network (FCDNet) that learns forensic features caused by FRUC in an end-to-end fashion.
FCDNet uses a stack of consecutive frames as the input and effectively learns artifacts using network blocks to learn features.
- Score: 7.895528973776606
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: With the advance in user-friendly and powerful video editing tools, anyone
can easily manipulate videos without leaving prominent visual traces.
Frame-rate up-conversion (FRUC), a representative temporal-domain operation,
increases the motion continuity of videos with a lower frame-rate and is used
by malicious counterfeiters in video tampering such as generating fake
frame-rate video without improving the quality or mixing temporally spliced
videos. FRUC is based on frame interpolation schemes and subtle artifacts that
remain in interpolated frames are often difficult to distinguish. Hence,
detecting such forgery traces is a critical issue in video forensics. This
paper proposes a frame-rate conversion detection network (FCDNet) that learns
forensic features caused by FRUC in an end-to-end fashion. The proposed network
uses a stack of consecutive frames as the input and effectively learns
interpolation artifacts using network blocks to learn spatiotemporal features.
This study is the first attempt to apply a neural network to the detection of
FRUC. Moreover, it can cover the following three types of frame interpolation
schemes: nearest neighbor interpolation, bilinear interpolation, and
motion-compensated interpolation. In contrast to existing methods that exploit
all frames to verify integrity, the proposed approach achieves a high detection
speed because it observes only six frames to test its authenticity. Extensive
experiments were conducted with conventional forensic methods and neural
networks for video forensic tasks to validate our research. The proposed
network achieved state-of-the-art performance in terms of detecting the
interpolated artifacts of FRUC. The experimental results also demonstrate that
our trained model is robust for an unseen dataset, unlearned frame-rate, and
unlearned quality factor.
Related papers
- Video Dynamics Prior: An Internal Learning Approach for Robust Video
Enhancements [83.5820690348833]
We present a framework for low-level vision tasks that does not require any external training data corpus.
Our approach learns neural modules by optimizing over a corrupted sequence, leveraging the weights of the coherence-temporal test and statistics internal statistics.
arXiv Detail & Related papers (2023-12-13T01:57:11Z) - RIGID: Recurrent GAN Inversion and Editing of Real Face Videos [73.97520691413006]
GAN inversion is indispensable for applying the powerful editability of GAN to real images.
Existing methods invert video frames individually often leading to undesired inconsistent results over time.
We propose a unified recurrent framework, named textbfRecurrent vtextbfIdeo textbfGAN textbfInversion and etextbfDiting (RIGID)
Our framework learns the inherent coherence between input frames in an end-to-end manner.
arXiv Detail & Related papers (2023-08-11T12:17:24Z) - NSNet: Non-saliency Suppression Sampler for Efficient Video Recognition [89.84188594758588]
A novel Non-saliency Suppression Network (NSNet) is proposed to suppress the responses of non-salient frames.
NSNet achieves the state-of-the-art accuracy-efficiency trade-off and presents a significantly faster (2.44.3x) practical inference speed than state-of-the-art methods.
arXiv Detail & Related papers (2022-07-21T09:41:22Z) - TTVFI: Learning Trajectory-Aware Transformer for Video Frame
Interpolation [50.49396123016185]
Video frame (VFI) aims to synthesize an intermediate frame between two consecutive frames.
We propose a novel Trajectory-aware Transformer for Video Frame Interpolation (TTVFI)
Our method outperforms other state-of-the-art methods in four widely-used VFI benchmarks.
arXiv Detail & Related papers (2022-07-19T03:37:49Z) - Spatial-Temporal Frequency Forgery Clue for Video Forgery Detection in
VIS and NIR Scenario [87.72258480670627]
Existing face forgery detection methods based on frequency domain find that the GAN forged images have obvious grid-like visual artifacts in the frequency spectrum compared to the real images.
This paper proposes a Cosine Transform-based Forgery Clue Augmentation Network (FCAN-DCT) to achieve a more comprehensive spatial-temporal feature representation.
arXiv Detail & Related papers (2022-07-05T09:27:53Z) - Video Shadow Detection via Spatio-Temporal Interpolation Consistency
Training [31.115226660100294]
We propose a framework to feed the unlabeled video frames together with the labeled images into an image shadow detection network training.
We then derive the spatial and temporal consistency constraints accordingly for enhancing generalization in the pixel-wise classification.
In addition, we design a Scale-Aware Network for multi-scale shadow knowledge learning in images.
arXiv Detail & Related papers (2022-06-17T14:29:51Z) - Video Salient Object Detection via Contrastive Features and Attention
Modules [106.33219760012048]
We propose a network with attention modules to learn contrastive features for video salient object detection.
A co-attention formulation is utilized to combine the low-level and high-level features.
We show that the proposed method requires less computation, and performs favorably against the state-of-the-art approaches.
arXiv Detail & Related papers (2021-11-03T17:40:32Z) - PAT: Pseudo-Adversarial Training For Detecting Adversarial Videos [20.949656274807904]
We propose a novel yet simple algorithm called Pseudo-versa-Adrial Training (PAT) to detect the adversarial frames in a video without requiring knowledge of the attack.
Experimental results on UCF-101 and 20BN-Jester datasets show that PAT can detect the adversarial video frames and videos with a high detection rate.
arXiv Detail & Related papers (2021-09-13T04:05:46Z) - Temporal Early Exits for Efficient Video Object Detection [1.1470070927586016]
We propose temporal early exits to reduce the computational complexity of per-frame video object detection.
Our method significantly reduces the computational complexity and execution of per-frame video object detection up to $34 times$ compared to existing methods.
arXiv Detail & Related papers (2021-06-21T15:49:46Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.