Related papers: Culture-TRIP: Culturally-Aware Text-to-Image Generation with Iterative Prompt Refinment

Culture-TRIP: Culturally-Aware Text-to-Image Generation with Iterative Prompt Refinment

URL: http://arxiv.org/abs/2502.16902v1
Date: Mon, 24 Feb 2025 06:56:56 GMT
Title: Culture-TRIP: Culturally-Aware Text-to-Image Generation with Iterative Prompt Refinment
Authors: Suchae Jeong, Inseong Choi, Youngsik Yun, Jihie Kim,
Abstract summary: We propose a novel approach, Culturally-Aware Text-to-Image Generation with Iterative Prompt Refinement (Culture-TRIP)<n>Our approach retrieves cultural contexts and visual details related to the culture nouns in the prompt.<n>It iteratively refines and evaluates the prompt based on a set of cultural criteria and large language models.
Score: 2.089922606370409
License: http://creativecommons.org/licenses/by/4.0/
Abstract: Text-to-Image models, including Stable Diffusion, have significantly improved in generating images that are highly semantically aligned with the given prompts. However, existing models may fail to produce appropriate images for the cultural concepts or objects that are not well known or underrepresented in western cultures, such as `hangari' (Korean utensil). In this paper, we propose a novel approach, Culturally-Aware Text-to-Image Generation with Iterative Prompt Refinement (Culture-TRIP), which refines the prompt in order to improve the alignment of the image with such culture nouns in text-to-image models. Our approach (1) retrieves cultural contexts and visual details related to the culture nouns in the prompt and (2) iteratively refines and evaluates the prompt based on a set of cultural criteria and large language models. The refinement process utilizes the information retrieved from Wikipedia and the Web. Our user survey, conducted with 66 participants from eight different countries demonstrates that our proposed approach enhances the alignment between the images and the prompts. In particular, C-TRIP demonstrates improved alignment between the generated images and underrepresented culture nouns. Resource can be found at https://shane3606.github.io/Culture-TRIP.

Related papers

Seedream 2.0: A Native Chinese-English Bilingual Image Generation Foundation Model [69.09404597939744]
Seedream 2.0 is a native Chinese-English bilingual image generation foundation model. It adeptly manages text prompt in both Chinese and English, supporting bilingual image generation and text rendering. It is integrated with a self-developed bilingual large language model as a text encoder, allowing it to learn native knowledge directly from massive data.
arXiv Detail & Related papers (2025-03-10T17:58:33Z)
Diffusion Models Through a Global Lens: Are They Culturally Inclusive? [15.991121392458748]
We introduce CultDiff benchmark, evaluating state-of-the-art diffusion models. We show that these models often fail to generate cultural artifacts in architecture, clothing, and food, especially for underrepresented country regions. We develop a neural-based image-image similarity metric, namely, CultDiff-S, to predict human judgment on real and generated images with cultural artifacts.
arXiv Detail & Related papers (2025-02-13T03:05:42Z)
Bringing Characters to New Stories: Training-Free Theme-Specific Image Generation via Dynamic Visual Prompting [71.29100512700064]
We present T-Prompter, a training-free method for theme-specific image generation.<n>T-Prompter integrates reference images into generative models, allowing users to seamlessly specify the target theme.<n>Our approach enables consistent story generation, character design, realistic character generation, and style-guided image generation.
arXiv Detail & Related papers (2025-01-26T19:01:19Z)
An image speaks a thousand words, but can everyone listen? On image transcreation for cultural relevance [53.974497865647336]
We take a first step towards translating images to make them culturally relevant. We build three pipelines comprising state-of-the-art generative models to do the task. We conduct a human evaluation of translated images to assess for cultural relevance and meaning preservation.
arXiv Detail & Related papers (2024-04-01T17:08:50Z)
CIC: A Framework for Culturally-Aware Image Captioning [2.565964707090901]
We propose a new framework, Culturally-aware Image Captioning (CIC), that generates captions and describes cultural elements extracted from cultural visual elements in images representing cultures.<n>Inspired by methods combining visual modality and Large Language Models (LLMs), our framework generates questions based on cultural categories from images.<n>Our human evaluation conducted on 45 participants from 4 different cultural groups with a high understanding of the corresponding culture shows that our proposed framework generates more culturally descriptive captions.
arXiv Detail & Related papers (2024-02-08T03:12:25Z)
Prompt Expansion for Adaptive Text-to-Image Generation [51.67811570987088]
This paper proposes a Prompt Expansion framework that helps users generate high-quality, diverse images with less effort. The Prompt Expansion model takes a text query as input and outputs a set of expanded text prompts. We conduct a human evaluation study that shows that images generated through Prompt Expansion are more aesthetically pleasing and diverse than those generated by baseline methods.
arXiv Detail & Related papers (2023-12-27T21:12:21Z)
ITI-GEN: Inclusive Text-to-Image Generation [56.72212367905351]
This study investigates inclusive text-to-image generative models that generate images based on human-written prompts. We show that, for some attributes, images can represent concepts more expressively than text. We propose a novel approach, ITI-GEN, that leverages readily available reference images for Inclusive Text-to-Image GENeration.
arXiv Detail & Related papers (2023-09-11T15:54:30Z)
On the Cultural Gap in Text-to-Image Generation [75.69755281031951]
One challenge in text-to-image (T2I) generation is the inadvertent reflection of culture gaps present in the training data. There is no benchmark to systematically evaluate a T2I model's ability to generate cross-cultural images. We propose a Challenging Cross-Cultural (C3) benchmark with comprehensive evaluation criteria, which can assess how well-suited a model is to a target culture.
arXiv Detail & Related papers (2023-07-06T13:17:55Z)
Towards Equitable Representation in Text-to-Image Synthesis Models with the Cross-Cultural Understanding Benchmark (CCUB) Dataset [8.006068032606182]
We propose a culturally-aware priming approach for text-to-image synthesis using a small but culturally curated dataset. Our experiments indicate that priming using both text and image is effective in improving the cultural relevance and decreasing the offensiveness of generated images.
arXiv Detail & Related papers (2023-01-28T03:10:33Z)

This list is automatically generated from the titles and abstracts of the papers in this site.