Related papers: RusCode: Russian Cultural Code Benchmark for Text-to-Image Generation

RusCode: Russian Cultural Code Benchmark for Text-to-Image Generation

URL: http://arxiv.org/abs/2502.07455v1
Date: Tue, 11 Feb 2025 10:57:12 GMT
Title: RusCode: Russian Cultural Code Benchmark for Text-to-Image Generation
Authors: Viacheslav Vasilev, Julia Agafonova, Nikolai Gerasimenko, Alexander Kapitanov, Polina Mikhailova, Evelina Mironova, Denis Dimitrov,
Abstract summary: We propose a RusCode benchmark for evaluating the quality of text-to-image generation containing elements of the Russian cultural code.<n>Our final dataset consists of 1250 text prompts in Russian and their translations into English.<n>We present the results of a human evaluation of the side-by-side comparison of Russian visual concepts representations using popular generative models.
Score: 37.970098758333044
License: http://creativecommons.org/licenses/by/4.0/
Abstract: Text-to-image generation models have gained popularity among users around the world. However, many of these models exhibit a strong bias toward English-speaking cultures, ignoring or misrepresenting the unique characteristics of other language groups, countries, and nationalities. The lack of cultural awareness can reduce the generation quality and lead to undesirable consequences such as unintentional insult, and the spread of prejudice. In contrast to the field of natural language processing, cultural awareness in computer vision has not been explored as extensively. In this paper, we strive to reduce this gap. We propose a RusCode benchmark for evaluating the quality of text-to-image generation containing elements of the Russian cultural code. To do this, we form a list of 19 categories that best represent the features of Russian visual culture. Our final dataset consists of 1250 text prompts in Russian and their translations into English. The prompts cover a wide range of topics, including complex concepts from art, popular culture, folk traditions, famous people's names, natural objects, scientific achievements, etc. We present the results of a human evaluation of the side-by-side comparison of Russian visual concepts representations using popular generative models.

Related papers

CURVE: A Benchmark for Cultural and Multilingual Long Video Reasoning [58.73855961335903]
CURVE (Cultural Understanding and Reasoning in Video Evaluation) is a challenging benchmark for multicultural and multilingual video reasoning.<n>It comprises high-quality, entirely human-generated annotations from diverse, region-specific cultural videos across 18 global locales.<n>Our evaluations reveal that SoTA Video-LLMs struggle significantly, performing substantially below human-level accuracy.
arXiv Detail & Related papers (2026-01-15T18:15:06Z)
Culture in Action: Evaluating Text-to-Image Models through Social Activities [40.874302288116304]
Text-to-image (T2I) models achieve impressive photorealism by training on large-scale web data, but models inherit cultural biases and fail to depict underrepresented regions faithfully.<n>We introduce CULTIVate, a benchmark for evaluating T2I models on cross-cultural activities.<n>We propose four metrics to measure cultural alignment, hallucination, exaggerated elements, and diversity.
arXiv Detail & Related papers (2025-11-07T19:51:11Z)
CAIRe: Cultural Attribution of Images by Retrieval-Augmented Evaluation [61.130639734982395]
We introduce CAIRe, a novel evaluation metric that assesses the degree of cultural relevance of an image.<n>Our framework grounds entities and concepts in the image to a knowledge base and uses factual information to give independent graded judgments for each culture label.
arXiv Detail & Related papers (2025-06-10T17:16:23Z)
CRAFT: Cultural Russian-Oriented Dataset Adaptation for Focused Text-to-Image Generation [3.566419648777424]
We examine the concept of cultural code and recognize the critical importance of its understanding by modern image generation models.<n>We propose the methodology for collecting and processing the data necessary to form a dataset based on the cultural code.<n>Human evaluation results demonstrate an increase in the level of awareness of Russian culture in the model.
arXiv Detail & Related papers (2025-05-07T23:29:28Z)
Culture-TRIP: Culturally-Aware Text-to-Image Generation with Iterative Prompt Refinment [2.089922606370409]
We propose a novel approach, Culturally-Aware Text-to-Image Generation with Iterative Prompt Refinement (Culture-TRIP) Our approach retrieves cultural contexts and visual details related to the culture nouns in the prompt. It iteratively refines and evaluates the prompt based on a set of cultural criteria and large language models.
arXiv Detail & Related papers (2025-02-24T06:56:56Z)
Diffusion Models Through a Global Lens: Are They Culturally Inclusive? [15.991121392458748]
We introduce CultDiff benchmark, evaluating state-of-the-art diffusion models. We show that these models often fail to generate cultural artifacts in architecture, clothing, and food, especially for underrepresented country regions. We develop a neural-based image-image similarity metric, namely, CultDiff-S, to predict human judgment on real and generated images with cultural artifacts.
arXiv Detail & Related papers (2025-02-13T03:05:42Z)
See It from My Perspective: How Language Affects Cultural Bias in Image Understanding [60.70852566256668]
Vision-language models (VLMs) can respond to queries about images in many languages. We characterize the Western bias of VLMs in image understanding and investigate the role that language plays in this disparity.
arXiv Detail & Related papers (2024-06-17T15:49:51Z)
Extrinsic Evaluation of Cultural Competence in Large Language Models [53.626808086522985]
We focus on extrinsic evaluation of cultural competence in two text generation tasks. We evaluate model outputs when an explicit cue of culture, specifically nationality, is perturbed in the prompts. We find weak correlations between text similarity of outputs for different countries and the cultural values of these countries.
arXiv Detail & Related papers (2024-06-17T14:03:27Z)
An image speaks a thousand words, but can everyone listen? On image transcreation for cultural relevance [53.974497865647336]
We take a first step towards translating images to make them culturally relevant. We build three pipelines comprising state-of-the-art generative models to do the task. We conduct a human evaluation of translated images to assess for cultural relevance and meaning preservation.
arXiv Detail & Related papers (2024-04-01T17:08:50Z)
CIC: A Framework for Culturally-Aware Image Captioning [2.565964707090901]
We propose a new framework, Culturally-aware Image Captioning (CIC), that generates captions and describes cultural elements extracted from cultural visual elements in images representing cultures.<n>Inspired by methods combining visual modality and Large Language Models (LLMs), our framework generates questions based on cultural categories from images.<n>Our human evaluation conducted on 45 participants from 4 different cultural groups with a high understanding of the corresponding culture shows that our proposed framework generates more culturally descriptive captions.
arXiv Detail & Related papers (2024-02-08T03:12:25Z)
How well can Text-to-Image Generative Models understand Ethical Natural Language Interventions? [67.97752431429865]
We study the effect on the diversity of the generated images when adding ethical intervention. Preliminary studies indicate that a large change in the model predictions is triggered by certain phrases such as 'irrespective of gender'
arXiv Detail & Related papers (2022-10-27T07:32:39Z)
DALL-Eval: Probing the Reasoning Skills and Social Biases of Text-to-Image Generation Models [73.12069620086311]
We investigate the visual reasoning capabilities and social biases of text-to-image models. First, we measure three visual reasoning skills: object recognition, object counting, and spatial relation understanding. Second, we assess the gender and skin tone biases by measuring the gender/skin tone distribution of generated images.
arXiv Detail & Related papers (2022-02-08T18:36:52Z)
Visually Grounded Reasoning across Languages and Cultures [27.31020761908739]
We develop a new protocol to construct an ImageNet-style hierarchy representative of more languages and cultures. We focus on a typologically diverse set of languages, namely, Indonesian, Mandarin Chinese, Swahili, Tamil, and Turkish. We create a multilingual dataset for Multicultural Reasoning over Vision and Language (MaRVL) by eliciting statements from native speaker annotators about pairs of images.
arXiv Detail & Related papers (2021-09-28T16:51:38Z)
Deception detection in text and its relation to the cultural dimension of individualism/collectivism [6.17866386107486]
We investigate if differences in the usage of specific linguistic features of deception across cultures can be confirmed and attributed to norms in respect to the individualism/collectivism divide. We create culture/language-aware classifiers by experimenting with a wide range of n-gram features based on phonology, morphology and syntax. We conducted our experiments over 11 datasets from 5 languages i.e., English, Dutch, Russian, Spanish and Romanian, from six countries (US, Belgium, India, Russia, Mexico and Romania)
arXiv Detail & Related papers (2021-05-26T13:09:47Z)

This list is automatically generated from the titles and abstracts of the papers in this site.