Texture, Shape and Order Matter: A New Transformer Design for Sequential DeepFake Detection
- URL: http://arxiv.org/abs/2404.13873v3
- Date: Tue, 29 Oct 2024 08:47:57 GMT
- Title: Texture, Shape and Order Matter: A New Transformer Design for Sequential DeepFake Detection
- Authors: Yunfei Li, Yuezun Li, Xin Wang, Baoyuan Wu, Jiaran Zhou, Junyu Dong,
- Abstract summary: Sequential DeepFake detection is an emerging task that predicts the manipulation sequence in order.
This paper describes a new Transformer design, called TSOM, by exploring three perspectives: Texture, Shape, and Order of Manipulations.
- Score: 57.100891917805086
- License:
- Abstract: Sequential DeepFake detection is an emerging task that predicts the manipulation sequence in order. Existing methods typically formulate it as an image-to-sequence problem, employing conventional Transformer architectures. However, these methods lack dedicated design and consequently result in limited performance. As such, this paper describes a new Transformer design, called TSOM, by exploring three perspectives: Texture, Shape, and Order of Manipulations. Our method features four major improvements: \ding{182} we describe a new texture-aware branch that effectively captures subtle manipulation traces with a Diversiform Pixel Difference Attention module. \ding{183} Then we introduce a Multi-source Cross-attention module to seek deep correlations among spatial and sequential features, enabling effective modeling of complex manipulation traces. \ding{184} To further enhance the cross-attention, we describe a Shape-guided Gaussian mapping strategy, providing initial priors of the manipulation shape. \ding{185} Finally, observing that the subsequent manipulation in a sequence may influence traces left in the preceding one, we intriguingly invert the prediction order from forward to backward, leading to notable gains as expected. Extensive experimental results demonstrate that our method outperforms others by a large margin, highlighting the superiority of our method.
Related papers
- Segmentation-guided Layer-wise Image Vectorization with Gradient Fills [6.037332707968933]
We propose a segmentation-guided vectorization framework to convert images into concise vector graphics with gradient fills.
With the guidance of an embedded gradient-aware segmentation, our approach progressively appends gradient-filled B'ezier paths to the output.
arXiv Detail & Related papers (2024-08-28T12:08:25Z) - Enhancing 3D Transformer Segmentation Model for Medical Image with Token-level Representation Learning [9.896550384001348]
This work proposes a token-level representation learning loss that maximizes agreement between token embeddings from different augmented views individually.
We also invent a simple "rotate-and-restore" mechanism, which rotates and flips one augmented view of input volume, and later restores the order of tokens in the feature maps.
We test our pre-training scheme on two public medical segmentation datasets, and the results on the downstream segmentation task show more improvement of our methods than other state-of-the-art pre-trainig methods.
arXiv Detail & Related papers (2024-08-12T01:49:13Z) - DiffusionMat: Alpha Matting as Sequential Refinement Learning [87.76572845943929]
DiffusionMat is an image matting framework that employs a diffusion model for the transition from coarse to refined alpha mattes.
A correction module adjusts the output at each denoising step, ensuring that the final result is consistent with the input image's structures.
We evaluate our model across several image matting benchmarks, and the results indicate that DiffusionMat consistently outperforms existing methods.
arXiv Detail & Related papers (2023-11-22T17:16:44Z) - Uncovering mesa-optimization algorithms in Transformers [61.06055590704677]
Some autoregressive models can learn as an input sequence is processed, without undergoing any parameter changes, and without being explicitly trained to do so.
We show that standard next-token prediction error minimization gives rise to a subsidiary learning algorithm that adjusts the model as new inputs are revealed.
Our findings explain in-context learning as a product of autoregressive loss minimization and inform the design of new optimization-based Transformer layers.
arXiv Detail & Related papers (2023-09-11T22:42:50Z) - Convolutions and More as Einsum: A Tensor Network Perspective with Advances for Second-Order Methods [2.8645507575980074]
We simplify convolutions by viewing them as tensor networks (TNs)
TNs allow reasoning about the underlying tensor multiplications by drawing diagrams, manipulating them to perform function transformations like differentiation, and efficiently evaluating them with einsum.
Our TN implementation accelerates KFAC variant up to 4.5x while removing the standard implementation's memory overhead, and enables new hardware-efficient dropouts for approximate backpropagation.
arXiv Detail & Related papers (2023-07-05T13:19:41Z) - High-resolution Face Swapping via Latent Semantics Disentanglement [50.23624681222619]
We present a novel high-resolution hallucination face swapping method using the inherent prior knowledge of a pre-trained GAN model.
We explicitly disentangle the latent semantics by utilizing the progressive nature of the generator.
We extend our method to video face swapping by enforcing two-temporal constraints on the latent space and the image space.
arXiv Detail & Related papers (2022-03-30T00:33:08Z) - Visual Saliency Transformer [127.33678448761599]
We develop a novel unified model based on a pure transformer, Visual Saliency Transformer (VST), for both RGB and RGB-D salient object detection (SOD)
It takes image patches as inputs and leverages the transformer to propagate global contexts among image patches.
Experimental results show that our model outperforms existing state-of-the-art results on both RGB and RGB-D SOD benchmark datasets.
arXiv Detail & Related papers (2021-04-25T08:24:06Z) - Towards Enhancing Fine-grained Details for Image Matting [40.17208660790402]
We argue that recovering microscopic details relies on low-level but high-definition texture features.
Our model consists of a conventional encoder-decoder Semantic Path and an independent down-sampling-free Textural Compensate Path.
Our method outperforms previous start-of-the-art methods on the Composition-1k dataset.
arXiv Detail & Related papers (2021-01-22T13:20:23Z) - Deep Shells: Unsupervised Shape Correspondence with Optimal Transport [52.646396621449]
We propose a novel unsupervised learning approach to 3D shape correspondence.
We show that the proposed method significantly improves over the state-of-the-art on multiple datasets.
arXiv Detail & Related papers (2020-10-28T22:24:07Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.