Multi-modal Document Presentation Attack Detection With Forensics Trace Disentanglement
- URL: http://arxiv.org/abs/2404.06663v1
- Date: Wed, 10 Apr 2024 00:11:03 GMT
- Title: Multi-modal Document Presentation Attack Detection With Forensics Trace Disentanglement
- Authors: Changsheng Chen, Yongyi Deng, Liangwei Lin, Zitong Yu, Zhimao Lai,
- Abstract summary: Document Presentation Attack Detection (DPAD) is an important measure in protecting the authenticity of a document image.
Recent DPAD methods demand additional resources, such as manual effort in collecting additional data or knowing the parameters of acquisition devices.
This work proposes a DPAD method based on multi-modal disentangled traces (MMDT) without the above drawbacks.
- Score: 22.751498009362795
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Document Presentation Attack Detection (DPAD) is an important measure in protecting the authenticity of a document image. However, recent DPAD methods demand additional resources, such as manual effort in collecting additional data or knowing the parameters of acquisition devices. This work proposes a DPAD method based on multi-modal disentangled traces (MMDT) without the above drawbacks. We first disentangle the recaptured traces by a self-supervised disentanglement and synthesis network to enhance the generalization capacity in document images with different contents and layouts. Then, unlike the existing DPAD approaches that rely only on data in the RGB domain, we propose to explicitly employ the disentangled recaptured traces as new modalities in the transformer backbone through adaptive multi-modal adapters to fuse RGB/trace features efficiently. Visualization of the disentangled traces confirms the effectiveness of the proposed method in different document contents. Extensive experiments on three benchmark datasets demonstrate the superiority of our MMDT method on representing forensic traces of recapturing distortion.
Related papers
- DECDM: Document Enhancement using Cycle-Consistent Diffusion Models [3.3813766129849845]
We propose DECDM, an end-to-end document-level image translation method inspired by recent advances in diffusion models.
Our method overcomes the limitations of paired training by independently training the source (noisy input) and target (clean output) models.
We also introduce simple data augmentation strategies to improve character-glyph conservation during translation.
arXiv Detail & Related papers (2023-11-16T07:16:02Z) - Image Generation and Learning Strategy for Deep Document Forgery
Detection [7.585489507445007]
Recent advancements in deep neural network (DNN) methods for generative tasks may amplify the threat of document forgery.
We construct a training dataset of document forgery images, named FD-VIED, by emulating possible attacks.
In our experiments, we demonstrate that our approach enhances detection performance.
arXiv Detail & Related papers (2023-11-07T01:40:00Z) - Towards General Visual-Linguistic Face Forgery Detection [95.73987327101143]
Deepfakes are realistic face manipulations that can pose serious threats to security, privacy, and trust.
Existing methods mostly treat this task as binary classification, which uses digital labels or mask signals to train the detection model.
We propose a novel paradigm named Visual-Linguistic Face Forgery Detection(VLFFD), which uses fine-grained sentence-level prompts as the annotation.
arXiv Detail & Related papers (2023-07-31T10:22:33Z) - MMNet: Multi-Collaboration and Multi-Supervision Network for Sequential
Deepfake Detection [81.59191603867586]
Sequential deepfake detection aims to identify forged facial regions with the correct sequence for recovery.
The recovery of forged images requires knowledge of the manipulation model to implement inverse transformations.
We propose Multi-Collaboration and Multi-Supervision Network (MMNet) that handles various spatial scales and sequential permutations in forged face images.
arXiv Detail & Related papers (2023-07-06T02:32:08Z) - Boundary Guided Learning-Free Semantic Control with Diffusion Models [44.37803942479853]
We present our BoundaryDiffusion method for efficient, effective and light-weight semantic control with frozen pre-trained DDMs.
We conduct extensive experiments on DPMs architectures (DDPM, iDDPM) and datasets (CelebA, CelebA-HQ, LSUN-church, LSUN-bedroom, AFHQ-dog) with different resolutions (64, 256)
arXiv Detail & Related papers (2023-02-16T15:21:46Z) - GERE: Generative Evidence Retrieval for Fact Verification [57.78768817972026]
We propose GERE, the first system that retrieves evidences in a generative fashion.
The experimental results on the FEVER dataset show that GERE achieves significant improvements over the state-of-the-art baselines.
arXiv Detail & Related papers (2022-04-12T03:49:35Z) - ObjectFormer for Image Manipulation Detection and Localization [118.89882740099137]
We propose ObjectFormer to detect and localize image manipulations.
We extract high-frequency features of the images and combine them with RGB features as multimodal patch embeddings.
We conduct extensive experiments on various datasets and the results verify the effectiveness of the proposed method.
arXiv Detail & Related papers (2022-03-28T12:27:34Z) - M2TR: Multi-modal Multi-scale Transformers for Deepfake Detection [74.19291916812921]
forged images generated by Deepfake techniques pose a serious threat to the trustworthiness of digital information.
In this paper, we aim to capture the subtle manipulation artifacts at different scales for Deepfake detection.
We introduce a high-quality Deepfake dataset, SR-DF, which consists of 4,000 DeepFake videos generated by state-of-the-art face swapping and facial reenactment methods.
arXiv Detail & Related papers (2021-04-20T05:43:44Z) - MEG: Multi-Evidence GNN for Multimodal Semantic Forensics [28.12652559292884]
Fake news often involves semantic manipulations across modalities such as image, text, location etc.
Recent research has centered the problem around images, calling it image repurposing.
We introduce a novel graph neural network based model for multimodal semantic forensics.
arXiv Detail & Related papers (2020-11-23T09:01:28Z) - Depth Guided Adaptive Meta-Fusion Network for Few-shot Video Recognition [86.31412529187243]
Few-shot video recognition aims at learning new actions with only very few labeled samples.
We propose a depth guided Adaptive Meta-Fusion Network for few-shot video recognition which is termed as AMeFu-Net.
arXiv Detail & Related papers (2020-10-20T03:06:20Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.