Exploiting Cultural Biases via Homoglyphs in Text-to-Image Synthesis
- URL: http://arxiv.org/abs/2209.08891v3
- Date: Tue, 9 Jan 2024 06:35:40 GMT
- Title: Exploiting Cultural Biases via Homoglyphs in Text-to-Image Synthesis
- Authors: Lukas Struppek, Dominik Hintersdorf, Felix Friedrich, Manuel Brack,
Patrick Schramowski, Kristian Kersting
- Abstract summary: Models for text-to-image synthesis have recently drawn a lot of interest from academia and the general public.
We show that by simply inserting single non-Latin characters in a textual description, common models reflect cultural stereotypes and biases in their generated images.
We propose a novel homoglyph unlearning method to fine-tune a text encoder, making it robust against homoglyph manipulations.
- Score: 33.080261792998826
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Models for text-to-image synthesis, such as DALL-E~2 and Stable Diffusion,
have recently drawn a lot of interest from academia and the general public.
These models are capable of producing high-quality images that depict a variety
of concepts and styles when conditioned on textual descriptions. However, these
models adopt cultural characteristics associated with specific Unicode scripts
from their vast amount of training data, which may not be immediately apparent.
We show that by simply inserting single non-Latin characters in a textual
description, common models reflect cultural stereotypes and biases in their
generated images. We analyze this behavior both qualitatively and
quantitatively, and identify a model's text encoder as the root cause of the
phenomenon. Additionally, malicious users or service providers may try to
intentionally bias the image generation to create racist stereotypes by
replacing Latin characters with similarly-looking characters from non-Latin
scripts, so-called homoglyphs. To mitigate such unnoticed script attacks, we
propose a novel homoglyph unlearning method to fine-tune a text encoder, making
it robust against homoglyph manipulations.
Related papers
- Skeleton and Font Generation Network for Zero-shot Chinese Character Generation [53.08596064763731]
We propose a novel Skeleton and Font Generation Network (SFGN) to achieve a more robust Chinese character font generation.
We conduct experiments on misspelled characters, a substantial portion of which slightly differs from the common ones.
Our approach visually demonstrates the efficacy of generated images and outperforms current state-of-the-art font generation methods.
arXiv Detail & Related papers (2025-01-14T12:15:49Z) - Conditional Text-to-Image Generation with Reference Guidance [81.99538302576302]
This paper explores using additional conditions of an image that provides visual guidance of the particular subjects for diffusion models to generate.
We develop several small-scale expert plugins that efficiently endow a Stable Diffusion model with the capability to take different references.
Our expert plugins demonstrate superior results than the existing methods on all tasks, each containing only 28.55M trainable parameters.
arXiv Detail & Related papers (2024-11-22T21:38:51Z) - Empowering Backbone Models for Visual Text Generation with Input Granularity Control and Glyph-Aware Training [68.41837295318152]
Diffusion-based text-to-image models have demonstrated impressive achievements in diversity and aesthetics but struggle to generate images with visual texts.
Existing backbone models have limitations such as misspelling, failing to generate texts, and lack of support for Chinese text.
We propose a series of methods, aiming to empower backbone models to generate visual texts in English and Chinese.
arXiv Detail & Related papers (2024-10-06T10:25:39Z) - Learning to Generate Text in Arbitrary Writing Styles [6.7308816341849695]
It is desirable for language models to produce text in an author-specific style on the basis of a potentially small writing sample.
We propose to guide a language model to generate text in a target style using contrastively-trained representations that capture stylometric features.
arXiv Detail & Related papers (2023-12-28T18:58:52Z) - Word-Level Explanations for Analyzing Bias in Text-to-Image Models [72.71184730702086]
Text-to-image (T2I) models can generate images that underrepresent minorities based on race and sex.
This paper investigates which word in the input prompt is responsible for bias in generated images.
arXiv Detail & Related papers (2023-06-03T21:39:07Z) - GlyphDiffusion: Text Generation as Image Generation [100.98428068214736]
We propose GlyphDiffusion, a novel diffusion approach for text generation via text-guided image generation.
Our key idea is to render the target text as a glyph image containing visual language content.
Our model also makes significant improvements compared to the recent diffusion model.
arXiv Detail & Related papers (2023-04-25T02:14:44Z) - GlyphDraw: Seamlessly Rendering Text with Intricate Spatial Structures
in Text-to-Image Generation [18.396131717250793]
We introduce GlyphDraw, a general learning framework aiming to endow image generation models with the capacity to generate images coherently embedded with text for any specific language.
Our method not only produces accurate language characters as in prompts, but also seamlessly blends the generated text into the background.
arXiv Detail & Related papers (2023-03-31T08:06:33Z) - Handwritten Text Generation from Visual Archetypes [25.951540903019467]
We devise a Transformer-based model for Few-Shot styled handwritten text generation.
We obtain a robust representation of unseen writers' calligraphy by exploiting specific pre-training on a large synthetic dataset.
arXiv Detail & Related papers (2023-03-27T14:58:20Z) - Character-Aware Models Improve Visual Text Rendering [57.19915686282047]
Current image generation models struggle to reliably produce well-formed visual text.
Character-aware models provide large gains on a novel spelling task.
Our models set a much higher state-of-the-art on visual spelling, with 30+ point accuracy gains over competitors on rare words.
arXiv Detail & Related papers (2022-12-20T18:59:23Z) - Rickrolling the Artist: Injecting Backdoors into Text Encoders for
Text-to-Image Synthesis [16.421253324649555]
We introduce backdoor attacks against text-guided generative models.
Our attacks only slightly alter an encoder so that no suspicious model behavior is apparent for image generations with clean prompts.
arXiv Detail & Related papers (2022-11-04T12:36:36Z) - Language Does More Than Describe: On The Lack Of Figurative Speech in
Text-To-Image Models [63.545146807810305]
Text-to-image diffusion models can generate high-quality pictures from textual input prompts.
These models have been trained using text data collected from content-based labelling protocols.
We characterise the sentimentality, objectiveness and degree of abstraction of publicly available text data used to train current text-to-image diffusion models.
arXiv Detail & Related papers (2022-10-19T14:20:05Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.