From Local Concepts to Universals: Evaluating the Multicultural Understanding of Vision-Language Models
- URL: http://arxiv.org/abs/2407.00263v1
- Date: Fri, 28 Jun 2024 23:28:28 GMT
- Title: From Local Concepts to Universals: Evaluating the Multicultural Understanding of Vision-Language Models
- Authors: Mehar Bhatia, Sahithya Ravi, Aditya Chinchure, Eunjeong Hwang, Vered Shwartz,
- Abstract summary: Vision-language models' performance remains suboptimal on images from non-western cultures.
Various benchmarks have been proposed to test models' cultural inclusivity, but they have limited coverage of cultures.
We introduce the GlobalRG benchmark, comprising two challenging tasks: retrieval across universals and cultural visual grounding.
- Score: 10.121734731147376
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Despite recent advancements in vision-language models, their performance remains suboptimal on images from non-western cultures due to underrepresentation in training datasets. Various benchmarks have been proposed to test models' cultural inclusivity, but they have limited coverage of cultures and do not adequately assess cultural diversity across universal as well as culture-specific local concepts. To address these limitations, we introduce the GlobalRG benchmark, comprising two challenging tasks: retrieval across universals and cultural visual grounding. The former task entails retrieving culturally diverse images for universal concepts from 50 countries, while the latter aims at grounding culture-specific concepts within images from 15 countries. Our evaluation across a wide range of models reveals that the performance varies significantly across cultures -- underscoring the necessity for enhancing multicultural understanding in vision-language models.
Related papers
- CROPE: Evaluating In-Context Adaptation of Vision and Language Models to Culture-Specific Concepts [45.77570690529597]
We introduce CROPE, a visual question answering benchmark designed to probe the knowledge of culture-specific concepts.
Our evaluation of several state-of-the-art open Vision and Language models shows large performance disparities between culture-specific and common concepts.
Experiments with contextual knowledge indicate that models struggle to effectively utilize multimodal information and bind culture-specific concepts to their depictions.
arXiv Detail & Related papers (2024-10-20T17:31:19Z) - Navigating the Cultural Kaleidoscope: A Hitchhiker's Guide to Sensitivity in Large Language Models [4.771099208181585]
LLMs are increasingly deployed in global applications, ensuring users from diverse backgrounds feel respected and understood.
Cultural harm can arise when these models fail to align with specific cultural norms, resulting in misrepresentations or violations of cultural values.
We present two key contributions: A cultural harm test dataset, created to assess model outputs across different cultural contexts through scenarios that expose potential cultural insensitivities, and a culturally aligned preference dataset, aimed at restoring cultural sensitivity through fine-tuning based on feedback from diverse annotators.
arXiv Detail & Related papers (2024-10-15T18:13:10Z) - How Well Do LLMs Identify Cultural Unity in Diversity? [12.982460687543952]
We introduce a benchmark dataset for evaluating decoder-only large language models (LLMs) in understanding the cultural unity of concepts.
CUNIT consists of 1,425 evaluation examples building upon 285 traditional cultural-specific concepts across 10 countries.
We design a contrastive matching task to evaluate the LLMs' capability to identify highly associated cross-cultural concept pairs.
arXiv Detail & Related papers (2024-08-09T14:45:22Z) - Benchmarking Vision Language Models for Cultural Understanding [31.898921287065242]
This study introduces CulturalVQA, a visual question-answering benchmark aimed at assessing Vision Language Models (VLMs)
We curate a collection of 2,378 image-question pairs with 1-5 answers per question representing cultures from 11 countries across 5 continents.
The questions probe understanding of various facets of culture such as clothing, food, drinks, rituals, and traditions.
arXiv Detail & Related papers (2024-07-15T17:21:41Z) - Beyond Aesthetics: Cultural Competence in Text-to-Image Models [34.98692829036475]
CUBE is a first-of-its-kind benchmark to evaluate cultural competence of Text-to-Image models.
CUBE covers cultural artifacts associated with 8 countries across different geo-cultural regions.
CUBE-CSpace is a larger dataset of cultural artifacts that serves as grounding to evaluate cultural diversity.
arXiv Detail & Related papers (2024-07-09T13:50:43Z) - Extrinsic Evaluation of Cultural Competence in Large Language Models [53.626808086522985]
We focus on extrinsic evaluation of cultural competence in two text generation tasks.
We evaluate model outputs when an explicit cue of culture, specifically nationality, is perturbed in the prompts.
We find weak correlations between text similarity of outputs for different countries and the cultural values of these countries.
arXiv Detail & Related papers (2024-06-17T14:03:27Z) - CulturePark: Boosting Cross-cultural Understanding in Large Language Models [63.452948673344395]
This paper introduces CulturePark, an LLM-powered multi-agent communication framework for cultural data collection.
It generates high-quality cross-cultural dialogues encapsulating human beliefs, norms, and customs.
We evaluate these models across three downstream tasks: content moderation, cultural alignment, and cultural education.
arXiv Detail & Related papers (2024-05-24T01:49:02Z) - CULTURE-GEN: Revealing Global Cultural Perception in Language Models through Natural Language Prompting [73.94059188347582]
We uncover culture perceptions of three SOTA models on 110 countries and regions on 8 culture-related topics through culture-conditioned generations.
We discover that culture-conditioned generation consist of linguistic "markers" that distinguish marginalized cultures apart from default cultures.
arXiv Detail & Related papers (2024-04-16T00:50:43Z) - Massively Multi-Cultural Knowledge Acquisition & LM Benchmarking [48.21982147529661]
This paper introduces a novel approach for massively multicultural knowledge acquisition.
Our method strategically navigates from densely informative Wikipedia documents on cultural topics to an extensive network of linked pages.
Our work marks an important step towards deeper understanding and bridging the gaps of cultural disparities in AI.
arXiv Detail & Related papers (2024-02-14T18:16:54Z) - Not All Countries Celebrate Thanksgiving: On the Cultural Dominance in
Large Language Models [89.94270049334479]
This paper identifies a cultural dominance issue within large language models (LLMs)
LLMs often provide inappropriate English-culture-related answers that are not relevant to the expected culture when users ask in non-English languages.
arXiv Detail & Related papers (2023-10-19T05:38:23Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.