WorldView-Bench: A Benchmark for Evaluating Global Cultural Perspectives in Large Language Models
- URL: http://arxiv.org/abs/2505.09595v1
- Date: Wed, 14 May 2025 17:43:40 GMT
- Title: WorldView-Bench: A Benchmark for Evaluating Global Cultural Perspectives in Large Language Models
- Authors: Abdullah Mushtaq, Imran Taj, Rafay Naeem, Ibrahim Ghaznavi, Junaid Qadir,
- Abstract summary: Large Language Models (LLMs) are predominantly trained and aligned in ways that reinforce Western-centric epistemologies and socio-cultural norms.<n>We introduce WorldView-Bench, a benchmark designed to evaluate Global Cultural Inclusivity (GCI) in LLMs by analyzing their ability to accommodate diverse worldviews.
- Score: 1.094065133109559
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Large Language Models (LLMs) are predominantly trained and aligned in ways that reinforce Western-centric epistemologies and socio-cultural norms, leading to cultural homogenization and limiting their ability to reflect global civilizational plurality. Existing benchmarking frameworks fail to adequately capture this bias, as they rely on rigid, closed-form assessments that overlook the complexity of cultural inclusivity. To address this, we introduce WorldView-Bench, a benchmark designed to evaluate Global Cultural Inclusivity (GCI) in LLMs by analyzing their ability to accommodate diverse worldviews. Our approach is grounded in the Multiplex Worldview proposed by Senturk et al., which distinguishes between Uniplex models, reinforcing cultural homogenization, and Multiplex models, which integrate diverse perspectives. WorldView-Bench measures Cultural Polarization, the exclusion of alternative perspectives, through free-form generative evaluation rather than conventional categorical benchmarks. We implement applied multiplexity through two intervention strategies: (1) Contextually-Implemented Multiplex LLMs, where system prompts embed multiplexity principles, and (2) Multi-Agent System (MAS)-Implemented Multiplex LLMs, where multiple LLM agents representing distinct cultural perspectives collaboratively generate responses. Our results demonstrate a significant increase in Perspectives Distribution Score (PDS) entropy from 13% at baseline to 94% with MAS-Implemented Multiplex LLMs, alongside a shift toward positive sentiment (67.7%) and enhanced cultural balance. These findings highlight the potential of multiplex-aware AI evaluation in mitigating cultural bias in LLMs, paving the way for more inclusive and ethically aligned AI systems.
Related papers
- A Game-Theoretic Negotiation Framework for Cross-Cultural Consensus in LLMs [10.655783463895325]
Large language models (LLMs) exhibit a pronounced WEIRD (Western, Educated, Industrialized, Rich, Democratic) cultural bias.<n>This monocultural perspective may reinforce dominant values and marginalize diverse cultural viewpoints.<n>We introduce a systematic framework designed to boost fair and robust cross-cultural consensus.
arXiv Detail & Related papers (2025-06-16T08:42:39Z) - CAReDiO: Cultural Alignment of LLM via Representativeness and Distinctiveness Guided Data Optimization [50.90288681622152]
Large Language Models (LLMs) more deeply integrate into human life across various regions.<n>Existing approaches develop culturally aligned LLMs through fine-tuning with culture-specific corpora.<n>We introduce CAReDiO, a novel cultural data construction framework.
arXiv Detail & Related papers (2025-04-09T13:40:13Z) - Cultural Learning-Based Culture Adaptation of Language Models [70.1063219524999]
Adapting large language models (LLMs) to diverse cultural values is a challenging task.<n>We present CLCA, a novel framework for enhancing LLM alignment with cultural values based on cultural learning.
arXiv Detail & Related papers (2025-04-03T18:16:26Z) - Toward Inclusive Educational AI: Auditing Frontier LLMs through a Multiplexity Lens [1.094065133109559]
This paper proposes a framework to assess and mitigate cultural bias within large language models (LLMs)<n>Our analysis reveals that LLMs frequently exhibit cultural polarization, with biases appearing in both overt and subtle contextual cues.<n>We propose two strategies: textitContextually-Implemented Multiplex LLMs, which embed multiplex principles directly into the system prompt, and textitMulti-Agent System (MAS)-Implemented Multiplex LLMs, where multiple LLM agents, each representing distinct cultural viewpoints, collaboratively generate a balanced, synthesized response.
arXiv Detail & Related papers (2025-01-02T11:27:08Z) - All Languages Matter: Evaluating LMMs on Culturally Diverse 100 Languages [73.93600813999306]
ALM-bench is the largest and most comprehensive effort to date for evaluating LMMs across 100 languages.<n>It challenges existing models by testing their ability to understand and reason about culturally diverse images paired with text in various languages.<n>The benchmark offers a robust and nuanced evaluation framework featuring various question formats, including true/false, multiple choice, and open-ended questions.
arXiv Detail & Related papers (2024-11-25T15:44:42Z) - LLM-GLOBE: A Benchmark Evaluating the Cultural Values Embedded in LLM Output [8.435090588116973]
We propose the LLM-GLOBE benchmark for evaluating the cultural value systems of LLMs.
We then leverage the benchmark to compare the values of Chinese and US LLMs.
Our methodology includes a novel "LLMs-as-a-Jury" pipeline which automates the evaluation of open-ended content.
arXiv Detail & Related papers (2024-11-09T01:38:55Z) - Understanding the Capabilities and Limitations of Large Language Models for Cultural Commonsense [98.09670425244462]
Large language models (LLMs) have demonstrated substantial commonsense understanding.
This paper examines the capabilities and limitations of several state-of-the-art LLMs in the context of cultural commonsense tasks.
arXiv Detail & Related papers (2024-05-07T20:28:34Z) - CulturalTeaming: AI-Assisted Interactive Red-Teaming for Challenging LLMs' (Lack of) Multicultural Knowledge [69.82940934994333]
We introduce CulturalTeaming, an interactive red-teaming system that leverages human-AI collaboration to build challenging evaluation dataset.
Our study reveals that CulturalTeaming's various modes of AI assistance support annotators in creating cultural questions.
CULTURALBENCH-V0.1 is a compact yet high-quality evaluation dataset with users' red-teaming attempts.
arXiv Detail & Related papers (2024-04-10T00:25:09Z) - CDEval: A Benchmark for Measuring the Cultural Dimensions of Large Language Models [41.885600036131045]
CDEval is a benchmark aimed at evaluating the cultural dimensions of Large Language Models.
It is constructed by incorporating both GPT-4's automated generation and human verification, covering six cultural dimensions across seven domains.
arXiv Detail & Related papers (2023-11-28T02:01:25Z) - ChEF: A Comprehensive Evaluation Framework for Standardized Assessment
of Multimodal Large Language Models [49.48109472893714]
Multimodal Large Language Models (MLLMs) have shown impressive abilities in interacting with visual content with myriad potential downstream tasks.
We present the first Comprehensive Evaluation Framework (ChEF) that can holistically profile each MLLM and fairly compare different MLLMs.
We will publicly release all the detailed implementations for further analysis, as well as an easy-to-use modular toolkit for the integration of new recipes and models.
arXiv Detail & Related papers (2023-11-05T16:01:40Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.