DE-FAKE: Detection and Attribution of Fake Images Generated by
Text-to-Image Diffusion Models
- URL: http://arxiv.org/abs/2210.06998v1
- Date: Thu, 13 Oct 2022 13:08:54 GMT
- Title: DE-FAKE: Detection and Attribution of Fake Images Generated by
Text-to-Image Diffusion Models
- Authors: Zeyang Sha and Zheng Li and Ning Yu and Yang Zhang
- Abstract summary: We pioneer a systematic study of the authenticity of fake images generated by text-to-image diffusion models.
For visual modality, we propose universal detection that demonstrates fake images of these text-to-image diffusion models share common cues.
For linguistic modality, we analyze the impacts of text captions on the image authenticity of text-to-image diffusion models.
- Score: 12.310393737912412
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Diffusion models emerge to establish the new state of the art in the visual
generation. In particular, text-to-image diffusion models that generate images
based on caption descriptions have attracted increasing attention, impressed by
their user controllability. Despite encouraging performance, they exaggerate
concerns of fake image misuse and cast new pressures on fake image detection.
In this work, we pioneer a systematic study of the authenticity of fake images
generated by text-to-image diffusion models. In particular, we conduct
comprehensive studies from two perspectives unique to the text-to-image model,
namely, visual modality and linguistic modality. For visual modality, we
propose universal detection that demonstrates fake images of these
text-to-image diffusion models share common cues, which enable us to
distinguish them apart from real images. We then propose source attribution
that reveals the uniqueness of the fingerprints held by each diffusion model,
which can be used to attribute each fake image to its model source. A variety
of ablation and analysis studies further interpret the improvements from each
of our proposed methods. For linguistic modality, we delve deeper to
comprehensively analyze the impacts of text captions (called prompt analysis)
on the image authenticity of text-to-image diffusion models, and reason the
impacts to the detection and attribution performance of fake images. All
findings contribute to the community's insight into the natural properties of
text-to-image diffusion models, and we appeal to our community's consideration
on the counterpart solutions, like ours, against the rapidly-evolving fake
image generators.
Related papers
- Human-Object Interaction Detection Collaborated with Large Relation-driven Diffusion Models [65.82564074712836]
We introduce DIFfusionHOI, a new HOI detector shedding light on text-to-image diffusion models.
We first devise an inversion-based strategy to learn the expression of relation patterns between humans and objects in embedding space.
These learned relation embeddings then serve as textual prompts, to steer diffusion models generate images that depict specific interactions.
arXiv Detail & Related papers (2024-10-26T12:00:33Z) - DiffusionPID: Interpreting Diffusion via Partial Information Decomposition [24.83767778658948]
We apply information-theoretic principles to decompose the input text prompt into its elementary components.
We analyze how individual tokens and their interactions shape the generated image.
We show that PID is a potent tool for evaluating and diagnosing text-to-image diffusion models.
arXiv Detail & Related papers (2024-06-07T18:17:17Z) - ASAP: Interpretable Analysis and Summarization of AI-generated Image Patterns at Scale [20.12991230544801]
Generative image models have emerged as a promising technology to produce realistic images.
There is growing demand to empower users to effectively discern and comprehend patterns of AI-generated images.
We develop ASAP, an interactive visualization system that automatically extracts distinct patterns of AI-generated images.
arXiv Detail & Related papers (2024-04-03T18:20:41Z) - Unveiling the Truth: Exploring Human Gaze Patterns in Fake Images [34.02058539403381]
We leverage human semantic knowledge to investigate the possibility of being included in frameworks of fake image detection.
A preliminary statistical analysis is conducted to explore the distinctive patterns in how humans perceive genuine and altered images.
arXiv Detail & Related papers (2024-03-13T19:56:30Z) - Text-to-Image Diffusion Models are Great Sketch-Photo Matchmakers [120.49126407479717]
This paper explores text-to-image diffusion models for Zero-Shot Sketch-based Image Retrieval (ZS-SBIR)
We highlight a pivotal discovery: the capacity of text-to-image diffusion models to seamlessly bridge the gap between sketches and photos.
arXiv Detail & Related papers (2024-03-12T00:02:03Z) - Seek for Incantations: Towards Accurate Text-to-Image Diffusion
Synthesis through Prompt Engineering [118.53208190209517]
We propose a framework to learn the proper textual descriptions for diffusion models through prompt learning.
Our method can effectively learn the prompts to improve the matches between the input text and the generated images.
arXiv Detail & Related papers (2024-01-12T03:46:29Z) - Discffusion: Discriminative Diffusion Models as Few-shot Vision and Language Learners [88.07317175639226]
We propose a novel approach, Discriminative Stable Diffusion (DSD), which turns pre-trained text-to-image diffusion models into few-shot discriminative learners.
Our approach mainly uses the cross-attention score of a Stable Diffusion model to capture the mutual influence between visual and textual information.
arXiv Detail & Related papers (2023-05-18T05:41:36Z) - Parents and Children: Distinguishing Multimodal DeepFakes from Natural Images [60.34381768479834]
Recent advancements in diffusion models have enabled the generation of realistic deepfakes from textual prompts in natural language.
We pioneer a systematic study on deepfake detection generated by state-of-the-art diffusion models.
arXiv Detail & Related papers (2023-04-02T10:25:09Z) - Attend-and-Excite: Attention-Based Semantic Guidance for Text-to-Image
Diffusion Models [103.61066310897928]
Recent text-to-image generative models have demonstrated an unparalleled ability to generate diverse and creative imagery guided by a target text prompt.
While revolutionary, current state-of-the-art diffusion models may still fail in generating images that fully convey the semantics in the given text prompt.
We analyze the publicly available Stable Diffusion model and assess the existence of catastrophic neglect, where the model fails to generate one or more of the subjects from the input prompt.
We introduce the concept of Generative Semantic Nursing (GSN), where we seek to intervene in the generative process on the fly during inference time to improve the faithfulness
arXiv Detail & Related papers (2023-01-31T18:10:38Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.