When Tom Eats Kimchi: Evaluating Cultural Bias of Multimodal Large Language Models in Cultural Mixture Contexts
- URL: http://arxiv.org/abs/2503.16826v1
- Date: Fri, 21 Mar 2025 03:50:05 GMT
- Title: When Tom Eats Kimchi: Evaluating Cultural Bias of Multimodal Large Language Models in Cultural Mixture Contexts
- Authors: Jun Seong Kim, Kyaw Ye Thu, Javad Ismayilzada, Junyeong Park, Eunsu Kim, Huzama Ahmad, Na Min An, James Thorne, Alice Oh,
- Abstract summary: We introduce MixCuBe, a cross-cultural bias benchmark, and study elements from five countries and four ethnicities.<n>Our findings reveal that MLLMs achieve both higher accuracy and lower sensitivity to such perturbations for high-resource cultures.<n>GPT-4o, the best-performing model overall, shows up to 58% difference in accuracy between the original and perturbed cultural settings in low-resource cultures.
- Score: 15.78054683369659
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: In a highly globalized world, it is important for multi-modal large language models (MLLMs) to recognize and respond correctly to mixed-cultural inputs. For example, a model should correctly identify kimchi (Korean food) in an image both when an Asian woman is eating it, as well as an African man is eating it. However, current MLLMs show an over-reliance on the visual features of the person, leading to misclassification of the entities. To examine the robustness of MLLMs to different ethnicity, we introduce MixCuBe, a cross-cultural bias benchmark, and study elements from five countries and four ethnicities. Our findings reveal that MLLMs achieve both higher accuracy and lower sensitivity to such perturbation for high-resource cultures, but not for low-resource cultures. GPT-4o, the best-performing model overall, shows up to 58% difference in accuracy between the original and perturbed cultural settings in low-resource cultures. Our dataset is publicly available at: https://huggingface.co/datasets/kyawyethu/MixCuBe.
Related papers
- CARE: Aligning Language Models for Regional Cultural Awareness [28.676469530858924]
Existing language models (LMs) often exhibit a Western-centric bias and struggle to represent diverse cultural knowledge.
Previous attempts to address this rely on synthetic data and express cultural knowledge only in English.
We first introduce CARE, a multilingual resource of 24.1k responses with human preferences on 2,580 questions about Chinese and Arab cultures.
arXiv Detail & Related papers (2025-04-07T14:57:06Z) - Multilingual != Multicultural: Evaluating Gaps Between Multilingual Capabilities and Cultural Alignment in LLMs [2.5212698425008377]
Large Language Models (LLMs) are becoming increasingly capable across global languages.<n>However, the ability to communicate across languages does not necessarily translate to appropriate cultural representations.<n>We compare two families of models: Google's Gemma models and OpenAI's turbo-series.<n>We find no consistent relationships between language capabilities and cultural alignment.
arXiv Detail & Related papers (2025-02-23T11:02:41Z) - CulturalBench: a Robust, Diverse and Challenging Benchmark on Measuring the (Lack of) Cultural Knowledge of LLMs [75.82306181299153]
We introduce CulturalBench: a set of 1,227 human-written and human-verified questions for assessing cultural knowledge.
We evaluate models on two setups: CulturalBench-Easy and CulturalBench-Hard which share the same questions but asked differently.
Compared to human performance (92.6% accuracy), CulturalBench-Hard is more challenging for frontier LLMs with the best performing model (GPT-4o) at only 61.5% and the worst (Llama3-8b) at 21.4%.
arXiv Detail & Related papers (2024-10-03T17:04:31Z) - See It from My Perspective: How Language Affects Cultural Bias in Image Understanding [60.70852566256668]
Vision-language models (VLMs) can respond to queries about images in many languages.
We characterize the Western bias of VLMs in image understanding and investigate the role that language plays in this disparity.
arXiv Detail & Related papers (2024-06-17T15:49:51Z) - CulturePark: Boosting Cross-cultural Understanding in Large Language Models [63.452948673344395]
This paper introduces CulturePark, an LLM-powered multi-agent communication framework for cultural data collection.
It generates high-quality cross-cultural dialogues encapsulating human beliefs, norms, and customs.
We evaluate these models across three downstream tasks: content moderation, cultural alignment, and cultural education.
arXiv Detail & Related papers (2024-05-24T01:49:02Z) - CULTURE-GEN: Revealing Global Cultural Perception in Language Models through Natural Language Prompting [73.94059188347582]
We uncover culture perceptions of three SOTA models on 110 countries and regions on 8 culture-related topics through culture-conditioned generations.
We discover that culture-conditioned generation consist of linguistic "markers" that distinguish marginalized cultures apart from default cultures.
arXiv Detail & Related papers (2024-04-16T00:50:43Z) - Does Mapo Tofu Contain Coffee? Probing LLMs for Food-related Cultural Knowledge [47.57055368312541]
We introduce FmLAMA, a multilingual dataset centered on food-related cultural facts and variations in food practices.<n>We analyze LLMs across various architectures and configurations, evaluating their performance in both monolingual and multilingual settings.
arXiv Detail & Related papers (2024-04-10T08:49:27Z) - CultureLLM: Incorporating Cultural Differences into Large Language Models [36.66184989869121]
CultureLLM is a cost-effective solution to incorporate cultural differences into large language models.<n>We fine-tune culture-specific LLMs and one unified model (CultureLLM-One) for 9 cultures covering rich and low-resource languages.<n>Our human study shows that the generated samples are semantically equivalent to the original samples.
arXiv Detail & Related papers (2024-02-09T04:02:43Z) - Not All Countries Celebrate Thanksgiving: On the Cultural Dominance in
Large Language Models [89.94270049334479]
This paper identifies a cultural dominance issue within large language models (LLMs)
LLMs often provide inappropriate English-culture-related answers that are not relevant to the expected culture when users ask in non-English languages.
arXiv Detail & Related papers (2023-10-19T05:38:23Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.