How Culturally Aware are Vision-Language Models?
- URL: http://arxiv.org/abs/2405.17475v2
- Date: Sat, 08 Feb 2025 18:49:00 GMT
- Title: How Culturally Aware are Vision-Language Models?
- Authors: Olena Burda-Lassen, Aman Chadha, Shashank Goswami, Vinija Jain,
- Abstract summary: Images from folklore genres, such as mythology, folk dance, cultural signs, and symbols, are vital to every culture.
Our research compares the performance of four popular vision-language models in identifying culturally specific information in such images.
We propose a new evaluation metric, the Cultural Awareness Score (CAS), which measures the degree of cultural awareness in image captions.
- Score: 0.8437187555622164
- License:
- Abstract: An image is often considered worth a thousand words, and certain images can tell rich and insightful stories. Can these stories be told via image captioning? Images from folklore genres, such as mythology, folk dance, cultural signs, and symbols, are vital to every culture. Our research compares the performance of four popular vision-language models (GPT-4V, Gemini Pro Vision, LLaVA, and OpenFlamingo) in identifying culturally specific information in such images and creating accurate and culturally sensitive image captions. We also propose a new evaluation metric, the Cultural Awareness Score (CAS), which measures the degree of cultural awareness in image captions. We provide a dataset MOSAIC-1.5k labeled with ground truth for images containing cultural background and context and a labeled dataset with assigned Cultural Awareness Scores that can be used with unseen data. Creating culturally appropriate image captions is valuable for scientific research and can be beneficial for many practical applications. We envision our work will promote a deeper integration of cultural sensitivity in AI applications worldwide. By making the dataset and Cultural Awareness Score available to the public, we aim to facilitate further research in this area, encouraging the development of more culturally aware AI systems that respect and celebrate global diversity.
Related papers
- Diffusion Models Through a Global Lens: Are They Culturally Inclusive? [15.991121392458748]
We introduce CultDiff benchmark, evaluating state-of-the-art diffusion models.
We show that these models often fail to generate cultural artifacts in architecture, clothing, and food, especially for underrepresented country regions.
We develop a neural-based image-image similarity metric, namely, CultDiff-S, to predict human judgment on real and generated images with cultural artifacts.
arXiv Detail & Related papers (2025-02-13T03:05:42Z) - CultureVLM: Characterizing and Improving Cultural Understanding of Vision-Language Models for over 100 Countries [63.00147630084146]
Vision-language models (VLMs) have advanced human-AI interaction but struggle with cultural understanding.
CultureVerse is a large-scale multimodal benchmark covering 19, 682 cultural concepts, 188 countries/regions, 15 cultural concepts, and 3 question types.
We propose CultureVLM, a series of VLMs fine-tuned on our dataset to achieve significant performance improvement in cultural understanding.
arXiv Detail & Related papers (2025-01-02T14:42:37Z) - From Local Concepts to Universals: Evaluating the Multicultural Understanding of Vision-Language Models [10.121734731147376]
Vision-language models' performance remains suboptimal on images from non-western cultures.
Various benchmarks have been proposed to test models' cultural inclusivity, but they have limited coverage of cultures.
We introduce the GlobalRG benchmark, comprising two challenging tasks: retrieval across universals and cultural visual grounding.
arXiv Detail & Related papers (2024-06-28T23:28:28Z) - See It from My Perspective: Diagnosing the Western Cultural Bias of Large Vision-Language Models in Image Understanding [78.88461026069862]
Vision-language models (VLMs) can respond to queries about images in many languages.
We present a novel investigation that demonstrates and localizes Western bias in image understanding.
arXiv Detail & Related papers (2024-06-17T15:49:51Z) - Extrinsic Evaluation of Cultural Competence in Large Language Models [53.626808086522985]
We focus on extrinsic evaluation of cultural competence in two text generation tasks.
We evaluate model outputs when an explicit cue of culture, specifically nationality, is perturbed in the prompts.
We find weak correlations between text similarity of outputs for different countries and the cultural values of these countries.
arXiv Detail & Related papers (2024-06-17T14:03:27Z) - CulturePark: Boosting Cross-cultural Understanding in Large Language Models [63.452948673344395]
This paper introduces CulturePark, an LLM-powered multi-agent communication framework for cultural data collection.
It generates high-quality cross-cultural dialogues encapsulating human beliefs, norms, and customs.
We evaluate these models across three downstream tasks: content moderation, cultural alignment, and cultural education.
arXiv Detail & Related papers (2024-05-24T01:49:02Z) - CULTURE-GEN: Revealing Global Cultural Perception in Language Models through Natural Language Prompting [73.94059188347582]
We uncover culture perceptions of three SOTA models on 110 countries and regions on 8 culture-related topics through culture-conditioned generations.
We discover that culture-conditioned generation consist of linguistic "markers" that distinguish marginalized cultures apart from default cultures.
arXiv Detail & Related papers (2024-04-16T00:50:43Z) - Massively Multi-Cultural Knowledge Acquisition & LM Benchmarking [48.21982147529661]
This paper introduces a novel approach for massively multicultural knowledge acquisition.
Our method strategically navigates from densely informative Wikipedia documents on cultural topics to an extensive network of linked pages.
Our work marks an important step towards deeper understanding and bridging the gaps of cultural disparities in AI.
arXiv Detail & Related papers (2024-02-14T18:16:54Z) - CIC: A Framework for Culturally-Aware Image Captioning [2.565964707090901]
We propose a new framework, Culturally-aware Image Captioning (CIC), that generates captions and describes cultural elements extracted from cultural visual elements in images representing cultures.
Inspired by methods combining visual modality and Large Language Models (LLMs), our framework generates questions based on cultural categories from images.
Our human evaluation conducted on 45 participants from 4 different cultural groups with a high understanding of the corresponding culture shows that our proposed framework generates more culturally descriptive captions.
arXiv Detail & Related papers (2024-02-08T03:12:25Z) - Culture-to-Culture Image Translation and User Evaluation [0.0]
The article introduces the concept of image "culturization," which we define as the process of altering the brushstroke of cultural features"
We defined a pipeline for translating objects' images from a source to a target cultural domain based on state-of-the-art Generative Adversarial Networks.
We gathered data through an online questionnaire to test four hypotheses concerning the impact of images belonging to different cultural domains on Italian participants.
arXiv Detail & Related papers (2022-01-05T12:10:42Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.