Diffusion Models Through a Global Lens: Are They Culturally Inclusive?
- URL: http://arxiv.org/abs/2502.08914v1
- Date: Thu, 13 Feb 2025 03:05:42 GMT
- Title: Diffusion Models Through a Global Lens: Are They Culturally Inclusive?
- Authors: Zahra Bayramli, Ayhan Suleymanzade, Na Min An, Huzama Ahmad, Eunsu Kim, Junyeong Park, James Thorne, Alice Oh,
- Abstract summary: We introduce CultDiff benchmark, evaluating state-of-the-art diffusion models.
We show that these models often fail to generate cultural artifacts in architecture, clothing, and food, especially for underrepresented country regions.
We develop a neural-based image-image similarity metric, namely, CultDiff-S, to predict human judgment on real and generated images with cultural artifacts.
- Score: 15.991121392458748
- License:
- Abstract: Text-to-image diffusion models have recently enabled the creation of visually compelling, detailed images from textual prompts. However, their ability to accurately represent various cultural nuances remains an open question. In our work, we introduce CultDiff benchmark, evaluating state-of-the-art diffusion models whether they can generate culturally specific images spanning ten countries. We show that these models often fail to generate cultural artifacts in architecture, clothing, and food, especially for underrepresented country regions, by conducting a fine-grained analysis of different similarity aspects, revealing significant disparities in cultural relevance, description fidelity, and realism compared to real-world reference images. With the collected human evaluations, we develop a neural-based image-image similarity metric, namely, CultDiff-S, to predict human judgment on real and generated images with cultural artifacts. Our work highlights the need for more inclusive generative AI systems and equitable dataset representation over a wide range of cultures.
Related papers
- CultureVLM: Characterizing and Improving Cultural Understanding of Vision-Language Models for over 100 Countries [63.00147630084146]
Vision-language models (VLMs) have advanced human-AI interaction but struggle with cultural understanding.
CultureVerse is a large-scale multimodal benchmark covering 19, 682 cultural concepts, 188 countries/regions, 15 cultural concepts, and 3 question types.
We propose CultureVLM, a series of VLMs fine-tuned on our dataset to achieve significant performance improvement in cultural understanding.
arXiv Detail & Related papers (2025-01-02T14:42:37Z) - Risks of Cultural Erasure in Large Language Models [4.613949381428196]
We argue for the need of metricizable evaluations of language technologies that interrogate and account for historical power inequities.
We probe representations that a language model produces about different places around the world when asked to describe these contexts.
We analyze the cultures represented in the travel recommendations produced by a set of language model applications.
arXiv Detail & Related papers (2025-01-02T04:57:50Z) - From Local Concepts to Universals: Evaluating the Multicultural Understanding of Vision-Language Models [10.121734731147376]
Vision-language models' performance remains suboptimal on images from non-western cultures.
Various benchmarks have been proposed to test models' cultural inclusivity, but they have limited coverage of cultures.
We introduce the GlobalRG benchmark, comprising two challenging tasks: retrieval across universals and cultural visual grounding.
arXiv Detail & Related papers (2024-06-28T23:28:28Z) - Extrinsic Evaluation of Cultural Competence in Large Language Models [53.626808086522985]
We focus on extrinsic evaluation of cultural competence in two text generation tasks.
We evaluate model outputs when an explicit cue of culture, specifically nationality, is perturbed in the prompts.
We find weak correlations between text similarity of outputs for different countries and the cultural values of these countries.
arXiv Detail & Related papers (2024-06-17T14:03:27Z) - How Culturally Aware are Vision-Language Models? [0.8437187555622164]
Images from folklore genres, such as mythology, folk dance, cultural signs, and symbols, are vital to every culture.
Our research compares the performance of four popular vision-language models in identifying culturally specific information in such images.
We propose a new evaluation metric, the Cultural Awareness Score (CAS), which measures the degree of cultural awareness in image captions.
arXiv Detail & Related papers (2024-05-24T04:45:14Z) - Massively Multi-Cultural Knowledge Acquisition & LM Benchmarking [48.21982147529661]
This paper introduces a novel approach for massively multicultural knowledge acquisition.
Our method strategically navigates from densely informative Wikipedia documents on cultural topics to an extensive network of linked pages.
Our work marks an important step towards deeper understanding and bridging the gaps of cultural disparities in AI.
arXiv Detail & Related papers (2024-02-14T18:16:54Z) - From Pampas to Pixels: Fine-Tuning Diffusion Models for Ga\'ucho
Heritage [0.0]
This paper addresses the potential of Latent Diffusion Models (LDMs) in representing local cultural concepts, historical figures, and endangered species.
Our objective is to contribute to the broader understanding of how generative models can help to capture and preserve the cultural and historical identity of regions.
arXiv Detail & Related papers (2024-01-10T19:34:52Z) - Language Does More Than Describe: On The Lack Of Figurative Speech in
Text-To-Image Models [63.545146807810305]
Text-to-image diffusion models can generate high-quality pictures from textual input prompts.
These models have been trained using text data collected from content-based labelling protocols.
We characterise the sentimentality, objectiveness and degree of abstraction of publicly available text data used to train current text-to-image diffusion models.
arXiv Detail & Related papers (2022-10-19T14:20:05Z) - DALL-Eval: Probing the Reasoning Skills and Social Biases of
Text-to-Image Generation Models [73.12069620086311]
We investigate the visual reasoning capabilities and social biases of text-to-image models.
First, we measure three visual reasoning skills: object recognition, object counting, and spatial relation understanding.
Second, we assess the gender and skin tone biases by measuring the gender/skin tone distribution of generated images.
arXiv Detail & Related papers (2022-02-08T18:36:52Z) - Culture-to-Culture Image Translation and User Evaluation [0.0]
The article introduces the concept of image "culturization," which we define as the process of altering the brushstroke of cultural features"
We defined a pipeline for translating objects' images from a source to a target cultural domain based on state-of-the-art Generative Adversarial Networks.
We gathered data through an online questionnaire to test four hypotheses concerning the impact of images belonging to different cultural domains on Italian participants.
arXiv Detail & Related papers (2022-01-05T12:10:42Z) - From Culture to Clothing: Discovering the World Events Behind A Century
of Fashion Images [100.20851232528925]
We propose a data-driven approach to identify specific cultural factors affecting the clothes people wear.
Our work is a first step towards a computational, scalable, and easily refreshable approach to link culture to clothing.
arXiv Detail & Related papers (2021-02-02T18:58:21Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.