Discrepancy-Guided Reconstruction Learning for Image Forgery Detection
- URL: http://arxiv.org/abs/2304.13349v2
- Date: Wed, 3 May 2023 12:50:10 GMT
- Title: Discrepancy-Guided Reconstruction Learning for Image Forgery Detection
- Authors: Zenan Shi, Haipeng Chen, Long Chen and Dong Zhang
- Abstract summary: We first propose a Discrepancy-Guided (DisGE) to extract forgery-sensitive visual patterns.
We then introduce a Double-Head Reconstruction (DouHR) module to enhance genuine compact visual patterns in different granular spaces.
Under DouHR, we further introduce a Discrepancy-Aggregation Detector (DisAD) to aggregate these genuine compact visual patterns.
- Score: 10.221066530624373
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: In this paper, we propose a novel image forgery detection paradigm for
boosting the model learning capacity on both forgery-sensitive and genuine
compact visual patterns. Compared to the existing methods that only focus on
the discrepant-specific patterns (\eg, noises, textures, and frequencies), our
method has a greater generalization. Specifically, we first propose a
Discrepancy-Guided Encoder (DisGE) to extract forgery-sensitive visual
patterns. DisGE consists of two branches, where the mainstream backbone branch
is used to extract general semantic features, and the accessorial discrepant
external attention branch is used to extract explicit forgery cues. Besides, a
Double-Head Reconstruction (DouHR) module is proposed to enhance genuine
compact visual patterns in different granular spaces. Under DouHR, we further
introduce a Discrepancy-Aggregation Detector (DisAD) to aggregate these genuine
compact visual patterns, such that the forgery detection capability on unknown
patterns can be improved. Extensive experimental results on four challenging
datasets validate the effectiveness of our proposed method against
state-of-the-art competitors.
Related papers
- Dual-Image Enhanced CLIP for Zero-Shot Anomaly Detection [58.228940066769596]
We introduce a Dual-Image Enhanced CLIP approach, leveraging a joint vision-language scoring system.
Our methods process pairs of images, utilizing each as a visual reference for the other, thereby enriching the inference process with visual context.
Our approach significantly exploits the potential of vision-language joint anomaly detection and demonstrates comparable performance with current SOTA methods across various datasets.
arXiv Detail & Related papers (2024-05-08T03:13:20Z) - Forgery-aware Adaptive Transformer for Generalizable Synthetic Image
Detection [106.39544368711427]
We study the problem of generalizable synthetic image detection, aiming to detect forgery images from diverse generative methods.
We present a novel forgery-aware adaptive transformer approach, namely FatFormer.
Our approach tuned on 4-class ProGAN data attains an average of 98% accuracy to unseen GANs, and surprisingly generalizes to unseen diffusion models with 95% accuracy.
arXiv Detail & Related papers (2023-12-27T17:36:32Z) - Produce Once, Utilize Twice for Anomaly Detection [6.501323305130114]
We derive POUTA, which improves both the accuracy and efficiency by reusing the discriminant information potential in the reconstructive network.
POUTA achieves better performance than the state-of-the-art few-shot anomaly detection methods without any special design.
arXiv Detail & Related papers (2023-12-20T10:49:49Z) - Towards General Visual-Linguistic Face Forgery Detection [95.73987327101143]
Deepfakes are realistic face manipulations that can pose serious threats to security, privacy, and trust.
Existing methods mostly treat this task as binary classification, which uses digital labels or mask signals to train the detection model.
We propose a novel paradigm named Visual-Linguistic Face Forgery Detection(VLFFD), which uses fine-grained sentence-level prompts as the annotation.
arXiv Detail & Related papers (2023-07-31T10:22:33Z) - SeeABLE: Soft Discrepancies and Bounded Contrastive Learning for
Exposing Deepfakes [7.553507857251396]
We propose a novel deepfake detector, called SeeABLE, that formalizes the detection problem as a (one-class) out-of-distribution detection task.
SeeABLE pushes perturbed faces towards predefined prototypes using a novel regression-based bounded contrastive loss.
We show that our model convincingly outperforms competing state-of-the-art detectors, while exhibiting highly encouraging generalization capabilities.
arXiv Detail & Related papers (2022-11-21T09:38:30Z) - Towards Effective Image Manipulation Detection with Proposal Contrastive
Learning [61.5469708038966]
We propose Proposal Contrastive Learning (PCL) for effective image manipulation detection.
Our PCL consists of a two-stream architecture by extracting two types of global features from RGB and noise views respectively.
Our PCL can be easily adapted to unlabeled data in practice, which can reduce manual labeling costs and promote more generalizable features.
arXiv Detail & Related papers (2022-10-16T13:30:13Z) - MC-LCR: Multi-modal contrastive classification by locally correlated
representations for effective face forgery detection [11.124150983521158]
We propose a novel framework named Multi-modal Contrastive Classification by Locally Correlated Representations.
Our MC-LCR aims to amplify implicit local discrepancies between authentic and forged faces from both spatial and frequency domains.
We achieve state-of-the-art performance and demonstrate the robustness and generalization of our method.
arXiv Detail & Related papers (2021-10-07T09:24:12Z) - Generalizing Face Forgery Detection with High-frequency Features [63.33397573649408]
Current CNN-based detectors tend to overfit to method-specific color textures and thus fail to generalize.
We propose to utilize the high-frequency noises for face forgery detection.
The first is the multi-scale high-frequency feature extraction module that extracts high-frequency noises at multiple scales.
The second is the residual-guided spatial attention module that guides the low-level RGB feature extractor to concentrate more on forgery traces from a new perspective.
arXiv Detail & Related papers (2021-03-23T08:19:21Z) - Gait Recognition using Multi-Scale Partial Representation Transformation
with Capsules [22.99694601595627]
We propose a novel deep network, learning to transfer multi-scale partial gait representations using capsules.
Our network first obtains multi-scale partial representations using a state-of-the-art deep partial feature extractor.
It then recurrently learns the correlations and co-occurrences of the patterns among the partial features in forward and backward directions.
arXiv Detail & Related papers (2020-10-18T19:47:38Z) - Attention Model Enhanced Network for Classification of Breast Cancer
Image [54.83246945407568]
AMEN is formulated in a multi-branch fashion with pixel-wised attention model and classification submodular.
To focus more on subtle detail information, the sample image is enhanced by the pixel-wised attention map generated from former branch.
Experiments conducted on three benchmark datasets demonstrate the superiority of the proposed method under various scenarios.
arXiv Detail & Related papers (2020-10-07T08:44:21Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.