DeCLIP: Decoding CLIP representations for deepfake localization
- URL: http://arxiv.org/abs/2409.08849v1
- Date: Thu, 12 Sep 2024 17:59:08 GMT
- Title: DeCLIP: Decoding CLIP representations for deepfake localization
- Authors: Stefan Smeu, Elisabeta Oneata, Dan Oneata,
- Abstract summary: We introduce DeCLIP, a first attempt to leverage large pretrained features for detecting local manipulations.
We show that, when combined with a reasonably large convolutional decoder, pretrained self-supervised representations are able to perform localization.
Unlike previous work, our approach is able to perform localization on the challenging case of latent diffusion models.
- Score: 4.04729645587678
- License: http://creativecommons.org/licenses/by-nc-nd/4.0/
- Abstract: Generative models can create entirely new images, but they can also partially modify real images in ways that are undetectable to the human eye. In this paper, we address the challenge of automatically detecting such local manipulations. One of the most pressing problems in deepfake detection remains the ability of models to generalize to different classes of generators. In the case of fully manipulated images, representations extracted from large self-supervised models (such as CLIP) provide a promising direction towards more robust detectors. Here, we introduce DeCLIP, a first attempt to leverage such large pretrained features for detecting local manipulations. We show that, when combined with a reasonably large convolutional decoder, pretrained self-supervised representations are able to perform localization and improve generalization capabilities over existing methods. Unlike previous work, our approach is able to perform localization on the challenging case of latent diffusion models, where the entire image is affected by the fingerprint of the generator. Moreover, we observe that this type of data, which combines local semantic information with a global fingerprint, provides more stable generalization than other categories of generative methods.
Related papers
- MFCLIP: Multi-modal Fine-grained CLIP for Generalizable Diffusion Face Forgery Detection [64.29452783056253]
The rapid development of photo-realistic face generation methods has raised significant concerns in society and academia.
Although existing approaches mainly capture face forgery patterns using image modality, other modalities like fine-grained noises and texts are not fully explored.
We propose a novel multi-modal fine-grained CLIP (MFCLIP) model, which mines comprehensive and fine-grained forgery traces across image-noise modalities.
arXiv Detail & Related papers (2024-09-15T13:08:59Z) - Contrasting Deepfakes Diffusion via Contrastive Learning and Global-Local Similarities [88.398085358514]
Contrastive Deepfake Embeddings (CoDE) is a novel embedding space specifically designed for deepfake detection.
CoDE is trained via contrastive learning by additionally enforcing global-local similarities.
arXiv Detail & Related papers (2024-07-29T18:00:10Z) - Weakly-supervised deepfake localization in diffusion-generated images [4.548755617115687]
We propose a weakly-supervised localization problem based on the Xception network as the backbone architecture.
We show that the best performing detection method (based on local scores) is less sensitive to the looser supervision than to the mismatch in terms of dataset or generator.
arXiv Detail & Related papers (2023-11-08T10:27:36Z) - Pixel-Inconsistency Modeling for Image Manipulation Localization [59.968362815126326]
Digital image forensics plays a crucial role in image authentication and manipulation localization.
This paper presents a generalized and robust manipulation localization model through the analysis of pixel inconsistency artifacts.
Experiments show that our method successfully extracts inherent pixel-inconsistency forgery fingerprints.
arXiv Detail & Related papers (2023-09-30T02:54:51Z) - Securing Deep Generative Models with Universal Adversarial Signature [69.51685424016055]
Deep generative models pose threats to society due to their potential misuse.
In this paper, we propose to inject a universal adversarial signature into an arbitrary pre-trained generative model.
The proposed method is validated on the FFHQ and ImageNet datasets with various state-of-the-art generative models.
arXiv Detail & Related papers (2023-05-25T17:59:01Z) - TruFor: Leveraging all-round clues for trustworthy image forgery
detection and localization [17.270110456445806]
TruFor is a forensic framework that can be applied to a large variety of image manipulation methods.
We rely on the extraction of both high-level and low-level traces through a transformer-based fusion architecture.
Our method is able to reliably detect and localize both cheapfakes and deepfakes manipulations outperforming state-of-the-art works.
arXiv Detail & Related papers (2022-12-21T11:49:43Z) - Towards Effective Image Manipulation Detection with Proposal Contrastive
Learning [61.5469708038966]
We propose Proposal Contrastive Learning (PCL) for effective image manipulation detection.
Our PCL consists of a two-stream architecture by extracting two types of global features from RGB and noise views respectively.
Our PCL can be easily adapted to unlabeled data in practice, which can reduce manual labeling costs and promote more generalizable features.
arXiv Detail & Related papers (2022-10-16T13:30:13Z) - DeepFake Detection by Analyzing Convolutional Traces [0.0]
We focus on the analysis of Deepfakes of human faces with the objective of creating a new detection method.
The proposed technique, by means of an Expectation Maximization (EM) algorithm, extracts a set of local features specifically addressed to model the underlying convolutional generative process.
Results demonstrated the effectiveness of the technique in distinguishing the different architectures and the corresponding generation process.
arXiv Detail & Related papers (2020-04-22T09:02:55Z) - Decoupling Global and Local Representations via Invertible Generative
Flows [47.366299240738094]
Experimental results on standard image benchmarks demonstrate the effectiveness of our model in terms of density estimation, image generation and unsupervised representation learning.
This work demonstrates that a generative model with a likelihood-based objective is capable of learning decoupled representations, requiring no explicit supervision.
arXiv Detail & Related papers (2020-04-12T03:18:13Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.