Related papers: Discovering Universal Semantic Triggers for Text-to-Image Synthesis

Discovering Universal Semantic Triggers for Text-to-Image Synthesis

URL: http://arxiv.org/abs/2402.07562v1
Date: Mon, 12 Feb 2024 10:56:09 GMT
Title: Discovering Universal Semantic Triggers for Text-to-Image Synthesis
Authors: Shengfang Zhai, Weilong Wang, Jiajun Li, Yinpeng Dong, Hang Su and Qingni Shen
Abstract summary: We introduce Universal Semantic Trigger, a token sequence that can be added at any location within the input text yet can induce generated images towards a preset semantic target. Our work contributes to a further understanding of text-to-image synthesis and helps users to automatically auditing their models before deployment.
Score: 29.43615017915006
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Recently text-to-image models have gained widespread attention in the community due to their controllable and high-quality generation ability. However, the robustness of such models and their potential ethical issues have not been fully explored. In this paper, we introduce Universal Semantic Trigger, a meaningless token sequence that can be added at any location within the input text yet can induce generated images towards a preset semantic target.To thoroughly investigate it, we propose Semantic Gradient-based Search (SGS) framework. SGS automatically discovers the potential universal semantic triggers based on the given semantic targets. Furthermore, we design evaluation metrics to comprehensively evaluate semantic shift of images caused by these triggers. And our empirical analyses reveal that the mainstream open-source text-to-image models are vulnerable to our triggers, which could pose significant ethical threats. Our work contributes to a further understanding of text-to-image synthesis and helps users to automatically auditing their models before deployment.

Related papers

SynMind: Reducing Semantic Hallucination in fMRI-Based Image Reconstruction [52.34513874272676]
We argue that existing methods rely too heavily on entangled visual embeddings over explicit semantic identity.<n>We parse fMRI signals into rich, sentence-level semantic descriptions that mirror the hierarchical and compositional nature of human visual understanding.<n>We propose SynMind, a framework that integrates these explicit semantic encodings with visual priors to condition a pretrained diffusion model.
arXiv Detail & Related papers (2026-01-25T14:31:23Z)
When Semantics Regulate: Rethinking Patch Shuffle and Internal Bias for Generated Image Detection with CLIP [13.360123625878733]
CLIP-based detectors often rely on semantic cues rather than generator artifacts, leading to brittle performance under distribution shifts.<n>We show that Patch Shuffle provides an unusually strong benefit for CLIP, that disrupts global semantic continuity.<n>We propose SemAnti, a semantic-antagonistic fine-tuning paradigm that freezes the semantic subspace and adapts only artifact-sensitive layers.
arXiv Detail & Related papers (2025-11-24T13:54:00Z)
Semantic Visual Anomaly Detection and Reasoning in AI-Generated Images [96.43608872116347]
AnomReason is a large-scale benchmark with structured annotations as quadruple textbfAnomAgent<n>AnomReason and AnomAgent serve as a foundation for measuring and improving the semantic plausibility of AI-generated images.
arXiv Detail & Related papers (2025-10-11T14:09:24Z)
Semantic-Aware Reconstruction Error for Detecting AI-Generated Images [22.83053631078616]
We propose a novel representation, namely Semantic-Aware Reconstruction Error (SARE), that measures the semantic difference between an image and its caption-guided reconstruction.<n>SARE provides a robust and discriminative feature for detecting fake images across diverse generative models.<n>We also introduce a fusion module that integrates SARE into the backbone detector via a cross-attention mechanism.
arXiv Detail & Related papers (2025-08-13T04:37:36Z)
ControlThinker: Unveiling Latent Semantics for Controllable Image Generation through Visual Reasoning [76.2503352325492]
ControlThinker is a novel framework that employs a "comprehend-then-generate" paradigm.<n>Latent semantics from control images are mined to enrich text prompts.<n>This enriched semantic understanding then seamlessly aids in image generation without the need for additional complex modifications.
arXiv Detail & Related papers (2025-06-04T05:56:19Z)
On the Fairness, Diversity and Reliability of Text-to-Image Generative Models [49.60774626839712]
multimodal generative models have sparked critical discussions on their fairness, reliability, and potential for misuse. We propose an evaluation framework designed to assess model reliability through their responses to perturbations in the embedding space. Our method lays the groundwork for detecting unreliable, bias-injected models and retrieval of bias provenance.
arXiv Detail & Related papers (2024-11-21T09:46:55Z)
KITTEN: A Knowledge-Intensive Evaluation of Image Generation on Visual Entities [93.74881034001312]
We conduct a systematic study on the fidelity of entities in text-to-image generation models. We focus on their ability to generate a wide range of real-world visual entities, such as landmark buildings, aircraft, plants, and animals. Our findings reveal that even the most advanced text-to-image models often fail to generate entities with accurate visual details.
arXiv Detail & Related papers (2024-10-15T17:50:37Z)
SteerDiff: Steering towards Safe Text-to-Image Diffusion Models [5.781285400461636]
Text-to-image (T2I) diffusion models can be misused to produce inappropriate content. We introduce SteerDiff, a lightweight adaptor module designed to act as an intermediary between user input and the diffusion model. We conduct extensive experiments across various concept unlearning tasks to evaluate the effectiveness of our approach.
arXiv Detail & Related papers (2024-10-03T17:34:55Z)
Prompt-Consistency Image Generation (PCIG): A Unified Framework Integrating LLMs, Knowledge Graphs, and Controllable Diffusion Models [20.19571676239579]
We introduce a novel diffusion-based framework to enhance the alignment of generated images with their corresponding descriptions. Our framework is built upon a comprehensive analysis of inconsistency phenomena, categorizing them based on their manifestation in the image. We then integrate a state-of-the-art controllable image generation model with a visual text generation module to generate an image that is consistent with the original prompt.
arXiv Detail & Related papers (2024-06-24T06:12:16Z)
Vision-Language Synthetic Data Enhances Echocardiography Downstream Tasks [4.1942958779358674]
This paper utilizes recent vision-language models to produce diverse and realistic synthetic echocardiography image data. We show that the rich contextual information present in the synthesized data potentially enhances the accuracy and interpretability of downstream tasks.
arXiv Detail & Related papers (2024-03-28T23:26:45Z)
Likelihood-Based Text-to-Image Evaluation with Patch-Level Perceptual and Semantic Credit Assignment [48.835298314274254]
We propose to evaluate text-to-image generation performance by directly estimating the likelihood of the generated images. A higher likelihood indicates better perceptual quality and better text-image alignment. It can successfully assess the generation ability of these models with as few as a hundred samples.
arXiv Detail & Related papers (2023-08-16T17:26:47Z)
Generalizable Synthetic Image Detection via Language-guided Contrastive Learning [22.4158195581231]
malevolent use of synthetic images, such as the dissemination of fake news or the creation of fake profiles, raises significant concerns regarding the authenticity of images. We propose a simple yet very effective synthetic image detection method via a language-guided contrastive learning and a new formulation of the detection problem. It is shown that our proposed LanguAge-guided SynThEsis Detection (LASTED) model achieves much improved generalizability to unseen image generation models.
arXiv Detail & Related papers (2023-05-23T08:13:27Z)
Attend-and-Excite: Attention-Based Semantic Guidance for Text-to-Image Diffusion Models [103.61066310897928]
Recent text-to-image generative models have demonstrated an unparalleled ability to generate diverse and creative imagery guided by a target text prompt. While revolutionary, current state-of-the-art diffusion models may still fail in generating images that fully convey the semantics in the given text prompt. We analyze the publicly available Stable Diffusion model and assess the existence of catastrophic neglect, where the model fails to generate one or more of the subjects from the input prompt. We introduce the concept of Generative Semantic Nursing (GSN), where we seek to intervene in the generative process on the fly during inference time to improve the faithfulness
arXiv Detail & Related papers (2023-01-31T18:10:38Z)
Semantic Image Synthesis via Diffusion Models [174.24523061460704]
Denoising Diffusion Probabilistic Models (DDPMs) have achieved remarkable success in various image generation tasks. Recent work on semantic image synthesis mainly follows the de facto GAN-based approaches. We propose a novel framework based on DDPM for semantic image synthesis.
arXiv Detail & Related papers (2022-06-30T18:31:51Z)
More Control for Free! Image Synthesis with Semantic Diffusion Guidance [79.88929906247695]
Controllable image synthesis models allow creation of diverse images based on text instructions or guidance from an example image. We introduce a novel unified framework for semantic diffusion guidance, which allows either language or image guidance, or both. We conduct experiments on FFHQ and LSUN datasets, and show results on fine-grained text-guided image synthesis.
arXiv Detail & Related papers (2021-12-10T18:55:50Z)
Cycle-Consistent Inverse GAN for Text-to-Image Synthesis [101.97397967958722]
We propose a novel unified framework of Cycle-consistent Inverse GAN for both text-to-image generation and text-guided image manipulation tasks. We learn a GAN inversion model to convert the images back to the GAN latent space and obtain the inverted latent codes for each image. In the text-guided optimization module, we generate images with the desired semantic attributes by optimizing the inverted latent codes.
arXiv Detail & Related papers (2021-08-03T08:38:16Z)

This list is automatically generated from the titles and abstracts of the papers in this site.