Navigating Cultural Chasms: Exploring and Unlocking the Cultural POV of
Text-To-Image Models
- URL: http://arxiv.org/abs/2310.01929v2
- Date: Wed, 29 Nov 2023 15:11:02 GMT
- Title: Navigating Cultural Chasms: Exploring and Unlocking the Cultural POV of
Text-To-Image Models
- Authors: Mor Ventura and Eyal Ben-David and Anna Korhonen and Roi Reichart
- Abstract summary: We explore the cultural perception embedded in Text-To-Image models by characterizing culture across three tiers: cultural dimensions, cultural domains, and cultural concepts.
We propose a comprehensive suite of evaluation techniques, including intrinsic evaluations using the CLIP space, extrinsic evaluations with a Visual-Question-Answer (VQA) model and human assessments.
Our experiments provide insights regarding Do, What, Which and How research questions about the nature of cultural encoding in TTI models, paving the way for cross-cultural applications.
- Score: 36.04866429768613
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Text-To-Image (TTI) models, such as DALL-E and StableDiffusion, have
demonstrated remarkable prompt-based image generation capabilities.
Multilingual encoders may have a substantial impact on the cultural agency of
these models, as language is a conduit of culture. In this study, we explore
the cultural perception embedded in TTI models by characterizing culture across
three hierarchical tiers: cultural dimensions, cultural domains, and cultural
concepts. Based on this ontology, we derive prompt templates to unlock the
cultural knowledge in TTI models, and propose a comprehensive suite of
evaluation techniques, including intrinsic evaluations using the CLIP space,
extrinsic evaluations with a Visual-Question-Answer (VQA) model and human
assessments, to evaluate the cultural content of TTI-generated images. To
bolster our research, we introduce the CulText2I dataset, derived from four
diverse TTI models and spanning ten languages. Our experiments provide insights
regarding Do, What, Which and How research questions about the nature of
cultural encoding in TTI models, paving the way for cross-cultural applications
of these models.
Related papers
- Beyond Aesthetics: Cultural Competence in Text-to-Image Models [34.98692829036475]
CUBE is a first-of-its-kind benchmark to evaluate cultural competence of Text-to-Image models.
CUBE covers cultural artifacts associated with 8 countries across different geo-cultural regions.
CUBE-CSpace is a larger dataset of cultural artifacts that serves as grounding to evaluate cultural diversity.
arXiv Detail & Related papers (2024-07-09T13:50:43Z) - From Local Concepts to Universals: Evaluating the Multicultural Understanding of Vision-Language Models [10.121734731147376]
Vision-language models' performance remains suboptimal on images from non-western cultures.
Various benchmarks have been proposed to test models' cultural inclusivity, but they have limited coverage of cultures.
We introduce the GlobalRG benchmark, comprising two challenging tasks: retrieval across universals and cultural visual grounding.
arXiv Detail & Related papers (2024-06-28T23:28:28Z) - Extrinsic Evaluation of Cultural Competence in Large Language Models [53.626808086522985]
We focus on extrinsic evaluation of cultural competence in two text generation tasks.
We evaluate model outputs when an explicit cue of culture, specifically nationality, is perturbed in the prompts.
We find weak correlations between text similarity of outputs for different countries and the cultural values of these countries.
arXiv Detail & Related papers (2024-06-17T14:03:27Z) - CulturePark: Boosting Cross-cultural Understanding in Large Language Models [63.452948673344395]
This paper introduces CulturePark, an LLM-powered multi-agent communication framework for cultural data collection.
It generates high-quality cross-cultural dialogues encapsulating human beliefs, norms, and customs.
We evaluate these models across three downstream tasks: content moderation, cultural alignment, and cultural education.
arXiv Detail & Related papers (2024-05-24T01:49:02Z) - CIVICS: Building a Dataset for Examining Culturally-Informed Values in Large Language Models [59.22460740026037]
"CIVICS: Culturally-Informed & Values-Inclusive Corpus for Societal impacts" dataset is designed to evaluate the social and cultural variation of Large Language Models (LLMs)
We create a hand-crafted, multilingual dataset of value-laden prompts which address specific socially sensitive topics, including LGBTQI rights, social welfare, immigration, disability rights, and surrogacy.
arXiv Detail & Related papers (2024-05-22T20:19:10Z) - Massively Multi-Cultural Knowledge Acquisition & LM Benchmarking [48.21982147529661]
This paper introduces a novel approach for massively multicultural knowledge acquisition.
Our method strategically navigates from densely informative Wikipedia documents on cultural topics to an extensive network of linked pages.
Our work marks an important step towards deeper understanding and bridging the gaps of cultural disparities in AI.
arXiv Detail & Related papers (2024-02-14T18:16:54Z) - Cultural Alignment in Large Language Models: An Explanatory Analysis Based on Hofstede's Cultural Dimensions [10.415002561977655]
This research proposes a Cultural Alignment Test (Hoftede's CAT) to quantify cultural alignment using Hofstede's cultural dimension framework.
We quantitatively evaluate large language models (LLMs) against the cultural dimensions of regions like the United States, China, and Arab countries.
Our results quantify the cultural alignment of LLMs and reveal the difference between LLMs in explanatory cultural dimensions.
arXiv Detail & Related papers (2023-08-25T14:50:13Z) - On the Cultural Gap in Text-to-Image Generation [75.69755281031951]
One challenge in text-to-image (T2I) generation is the inadvertent reflection of culture gaps present in the training data.
There is no benchmark to systematically evaluate a T2I model's ability to generate cross-cultural images.
We propose a Challenging Cross-Cultural (C3) benchmark with comprehensive evaluation criteria, which can assess how well-suited a model is to a target culture.
arXiv Detail & Related papers (2023-07-06T13:17:55Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.