CRAFT: Cultural Russian-Oriented Dataset Adaptation for Focused Text-to-Image Generation
- URL: http://arxiv.org/abs/2505.04851v1
- Date: Wed, 07 May 2025 23:29:28 GMT
- Title: CRAFT: Cultural Russian-Oriented Dataset Adaptation for Focused Text-to-Image Generation
- Authors: Viacheslav Vasilev, Vladimir Arkhipkin, Julia Agafonova, Tatiana Nikulina, Evelina Mironova, Alisa Shichanina, Nikolai Gerasimenko, Mikhail Shoytov, Denis Dimitrov,
- Abstract summary: We examine the concept of cultural code and recognize the critical importance of its understanding by modern image generation models.<n>We propose the methodology for collecting and processing the data necessary to form a dataset based on the cultural code.<n>Human evaluation results demonstrate an increase in the level of awareness of Russian culture in the model.
- Score: 3.566419648777424
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Despite the fact that popular text-to-image generation models cope well with international and general cultural queries, they have a significant knowledge gap regarding individual cultures. This is due to the content of existing large training datasets collected on the Internet, which are predominantly based on Western European or American popular culture. Meanwhile, the lack of cultural adaptation of the model can lead to incorrect results, a decrease in the generation quality, and the spread of stereotypes and offensive content. In an effort to address this issue, we examine the concept of cultural code and recognize the critical importance of its understanding by modern image generation models, an issue that has not been sufficiently addressed in the research community to date. We propose the methodology for collecting and processing the data necessary to form a dataset based on the cultural code, in particular the Russian one. We explore how the collected data affects the quality of generations in the national domain and analyze the effectiveness of our approach using the Kandinsky 3.1 text-to-image model. Human evaluation results demonstrate an increase in the level of awareness of Russian culture in the model.
Related papers
- CAIRe: Cultural Attribution of Images by Retrieval-Augmented Evaluation [61.130639734982395]
We introduce CAIRe, a novel evaluation metric that assesses the degree of cultural relevance of an image.<n>Our framework grounds entities and concepts in the image to a knowledge base and uses factual information to give independent graded judgments for each culture label.
arXiv Detail & Related papers (2025-06-10T17:16:23Z) - Diffusion Models Through a Global Lens: Are They Culturally Inclusive? [15.991121392458748]
We introduce CultDiff benchmark, evaluating state-of-the-art diffusion models.<n>We show that these models often fail to generate cultural artifacts in architecture, clothing, and food, especially for underrepresented country regions.<n>We develop a neural-based image-image similarity metric, namely, CultDiff-S, to predict human judgment on real and generated images with cultural artifacts.
arXiv Detail & Related papers (2025-02-13T03:05:42Z) - RusCode: Russian Cultural Code Benchmark for Text-to-Image Generation [37.970098758333044]
We propose a RusCode benchmark for evaluating the quality of text-to-image generation containing elements of the Russian cultural code.<n>Our final dataset consists of 1250 text prompts in Russian and their translations into English.<n>We present the results of a human evaluation of the side-by-side comparison of Russian visual concepts representations using popular generative models.
arXiv Detail & Related papers (2025-02-11T10:57:12Z) - CultureVLM: Characterizing and Improving Cultural Understanding of Vision-Language Models for over 100 Countries [63.00147630084146]
Vision-language models (VLMs) have advanced human-AI interaction but struggle with cultural understanding.<n>CultureVerse is a large-scale multimodal benchmark covering 19, 682 cultural concepts, 188 countries/regions, 15 cultural concepts, and 3 question types.<n>We propose CultureVLM, a series of VLMs fine-tuned on our dataset to achieve significant performance improvement in cultural understanding.
arXiv Detail & Related papers (2025-01-02T14:42:37Z) - Risks of Cultural Erasure in Large Language Models [4.613949381428196]
We argue for the need of metricizable evaluations of language technologies that interrogate and account for historical power inequities.<n>We probe representations that a language model produces about different places around the world when asked to describe these contexts.<n>We analyze the cultures represented in the travel recommendations produced by a set of language model applications.
arXiv Detail & Related papers (2025-01-02T04:57:50Z) - Extrinsic Evaluation of Cultural Competence in Large Language Models [53.626808086522985]
We focus on extrinsic evaluation of cultural competence in two text generation tasks.
We evaluate model outputs when an explicit cue of culture, specifically nationality, is perturbed in the prompts.
We find weak correlations between text similarity of outputs for different countries and the cultural values of these countries.
arXiv Detail & Related papers (2024-06-17T14:03:27Z) - CulturePark: Boosting Cross-cultural Understanding in Large Language Models [63.452948673344395]
This paper introduces CulturePark, an LLM-powered multi-agent communication framework for cultural data collection.
It generates high-quality cross-cultural dialogues encapsulating human beliefs, norms, and customs.
We evaluate these models across three downstream tasks: content moderation, cultural alignment, and cultural education.
arXiv Detail & Related papers (2024-05-24T01:49:02Z) - Massively Multi-Cultural Knowledge Acquisition & LM Benchmarking [48.21982147529661]
This paper introduces a novel approach for massively multicultural knowledge acquisition.
Our method strategically navigates from densely informative Wikipedia documents on cultural topics to an extensive network of linked pages.
Our work marks an important step towards deeper understanding and bridging the gaps of cultural disparities in AI.
arXiv Detail & Related papers (2024-02-14T18:16:54Z) - On the Cultural Gap in Text-to-Image Generation [75.69755281031951]
One challenge in text-to-image (T2I) generation is the inadvertent reflection of culture gaps present in the training data.
There is no benchmark to systematically evaluate a T2I model's ability to generate cross-cultural images.
We propose a Challenging Cross-Cultural (C3) benchmark with comprehensive evaluation criteria, which can assess how well-suited a model is to a target culture.
arXiv Detail & Related papers (2023-07-06T13:17:55Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.