Text Modality Oriented Image Feature Extraction for Detecting Diffusion-based DeepFake
- URL: http://arxiv.org/abs/2405.18071v1
- Date: Tue, 28 May 2024 11:29:30 GMT
- Title: Text Modality Oriented Image Feature Extraction for Detecting Diffusion-based DeepFake
- Authors: Di Yang, Yihao Huang, Qing Guo, Felix Juefei-Xu, Xiaojun Jia, Run Wang, Geguang Pu, Yang Liu,
- Abstract summary: diffusion-based DeepFakes pose significant risks to the integrity and safety of online information.
We propose a text modality-oriented feature extraction method, termed TOFE.
Experiments conducted across ten diffusion types demonstrate the efficacy of our proposed method.
- Score: 32.237169711785896
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: The widespread use of diffusion methods enables the creation of highly realistic images on demand, thereby posing significant risks to the integrity and safety of online information and highlighting the necessity of DeepFake detection. Our analysis of features extracted by traditional image encoders reveals that both low-level and high-level features offer distinct advantages in identifying DeepFake images produced by various diffusion methods. Inspired by this finding, we aim to develop an effective representation that captures both low-level and high-level features to detect diffusion-based DeepFakes. To address the problem, we propose a text modality-oriented feature extraction method, termed TOFE. Specifically, for a given target image, the representation we discovered is a corresponding text embedding that can guide the generation of the target image with a specific text-to-image model. Experiments conducted across ten diffusion types demonstrate the efficacy of our proposed method.
Related papers
- DiffusionFake: Enhancing Generalization in Deepfake Detection via Guided Stable Diffusion [94.46904504076124]
Deepfake technology has made face swapping highly realistic, raising concerns about the malicious use of fabricated facial content.
Existing methods often struggle to generalize to unseen domains due to the diverse nature of facial manipulations.
We introduce DiffusionFake, a novel framework that reverses the generative process of face forgeries to enhance the generalization of detection models.
arXiv Detail & Related papers (2024-10-06T06:22:43Z) - Diffusion Models for Monocular Depth Estimation: Overcoming Challenging Conditions [30.148969711689773]
We present a novel approach designed to address the complexities posed by challenging, out-of-distribution data in the single-image depth estimation task.
We systematically generate new, user-defined scenes with a comprehensive set of challenges and associated depth information.
This is achieved by leveraging cutting-edge text-to-image diffusion models with depth-aware control.
arXiv Detail & Related papers (2024-07-23T17:59:59Z) - Diff-Mosaic: Augmenting Realistic Representations in Infrared Small Target Detection via Diffusion Prior [63.64088590653005]
We propose Diff-Mosaic, a data augmentation method based on the diffusion model.
We introduce an enhancement network called Pixel-Prior, which generates highly coordinated and realistic Mosaic images.
In the second stage, we propose an image enhancement strategy named Diff-Prior. This strategy utilizes diffusion priors to model images in the real-world scene.
arXiv Detail & Related papers (2024-06-02T06:23:05Z) - Text-to-Image Diffusion Models are Great Sketch-Photo Matchmakers [120.49126407479717]
This paper explores text-to-image diffusion models for Zero-Shot Sketch-based Image Retrieval (ZS-SBIR)
We highlight a pivotal discovery: the capacity of text-to-image diffusion models to seamlessly bridge the gap between sketches and photos.
arXiv Detail & Related papers (2024-03-12T00:02:03Z) - Coarse-to-Fine Latent Diffusion for Pose-Guided Person Image Synthesis [65.7968515029306]
We propose a novel Coarse-to-Fine Latent Diffusion (CFLD) method for Pose-Guided Person Image Synthesis (PGPIS)
A perception-refined decoder is designed to progressively refine a set of learnable queries and extract semantic understanding of person images as a coarse-grained prompt.
arXiv Detail & Related papers (2024-02-28T06:07:07Z) - Diffusion Facial Forgery Detection [56.69763252655695]
This paper introduces DiFF, a comprehensive dataset dedicated to face-focused diffusion-generated images.
We conduct extensive experiments on the DiFF dataset via a human test and several representative forgery detection methods.
The results demonstrate that the binary detection accuracy of both human observers and automated detectors often falls below 30%.
arXiv Detail & Related papers (2024-01-29T03:20:19Z) - Parents and Children: Distinguishing Multimodal DeepFakes from Natural Images [60.34381768479834]
Recent advancements in diffusion models have enabled the generation of realistic deepfakes from textual prompts in natural language.
We pioneer a systematic study on deepfake detection generated by state-of-the-art diffusion models.
arXiv Detail & Related papers (2023-04-02T10:25:09Z) - DE-FAKE: Detection and Attribution of Fake Images Generated by
Text-to-Image Diffusion Models [12.310393737912412]
We pioneer a systematic study of the authenticity of fake images generated by text-to-image diffusion models.
For visual modality, we propose universal detection that demonstrates fake images of these text-to-image diffusion models share common cues.
For linguistic modality, we analyze the impacts of text captions on the image authenticity of text-to-image diffusion models.
arXiv Detail & Related papers (2022-10-13T13:08:54Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.