Towards Measuring and Modeling "Culture" in LLMs: A Survey
- URL: http://arxiv.org/abs/2403.15412v5
- Date: Wed, 4 Sep 2024 05:12:54 GMT
- Title: Towards Measuring and Modeling "Culture" in LLMs: A Survey
- Authors: Muhammad Farid Adilazuarda, Sagnik Mukherjee, Pradhyumna Lavania, Siddhant Singh, Alham Fikri Aji, Jacki O'Neill, Ashutosh Modi, Monojit Choudhury,
- Abstract summary: We present a survey of more than 90 recent papers that aim to study cultural representation and inclusion in large language models (LLMs)
We observe that none of the studies explicitly define "culture"
We call these aspects the proxies of culture, and organize them across two dimensions of demographic and semantic proxies.
- Score: 21.94407169332458
- License: http://creativecommons.org/licenses/by-sa/4.0/
- Abstract: We present a survey of more than 90 recent papers that aim to study cultural representation and inclusion in large language models (LLMs). We observe that none of the studies explicitly define "culture, which is a complex, multifaceted concept; instead, they probe the models on some specially designed datasets which represent certain aspects of "culture". We call these aspects the proxies of culture, and organize them across two dimensions of demographic and semantic proxies. We also categorize the probing methods employed. Our analysis indicates that only certain aspects of ``culture,'' such as values and objectives, have been studied, leaving several other interesting and important facets, especially the multitude of semantic domains (Thompson et al., 2020) and aboutness (Hershcovich et al., 2022), unexplored. Two other crucial gaps are the lack of robustness of probing techniques and situated studies on the impact of cultural mis- and under-representation in LLM-based applications.
Related papers
- Survey of Cultural Awareness in Language Models: Text and Beyond [39.77033652289063]
Large-scale deployment of large language models (LLMs) in various applications requires LLMs to be culturally sensitive to the user to ensure inclusivity.
Culture has been widely studied in psychology and anthropology, and there has been a recent surge in research on making LLMs more culturally inclusive.
arXiv Detail & Related papers (2024-10-30T16:37:50Z) - How Well Do LLMs Identify Cultural Unity in Diversity? [12.982460687543952]
We introduce a benchmark dataset for evaluating decoder-only large language models (LLMs) in understanding the cultural unity of concepts.
CUNIT consists of 1,425 evaluation examples building upon 285 traditional cultural-specific concepts across 10 countries.
We design a contrastive matching task to evaluate the LLMs' capability to identify highly associated cross-cultural concept pairs.
arXiv Detail & Related papers (2024-08-09T14:45:22Z) - Extrinsic Evaluation of Cultural Competence in Large Language Models [53.626808086522985]
We focus on extrinsic evaluation of cultural competence in two text generation tasks.
We evaluate model outputs when an explicit cue of culture, specifically nationality, is perturbed in the prompts.
We find weak correlations between text similarity of outputs for different countries and the cultural values of these countries.
arXiv Detail & Related papers (2024-06-17T14:03:27Z) - CIVICS: Building a Dataset for Examining Culturally-Informed Values in Large Language Models [59.22460740026037]
"CIVICS: Culturally-Informed & Values-Inclusive Corpus for Societal impacts" dataset is designed to evaluate the social and cultural variation of Large Language Models (LLMs)
We create a hand-crafted, multilingual dataset of value-laden prompts which address specific socially sensitive topics, including LGBTQI rights, social welfare, immigration, disability rights, and surrogacy.
arXiv Detail & Related papers (2024-05-22T20:19:10Z) - Understanding the Capabilities and Limitations of Large Language Models for Cultural Commonsense [98.09670425244462]
Large language models (LLMs) have demonstrated substantial commonsense understanding.
This paper examines the capabilities and limitations of several state-of-the-art LLMs in the context of cultural commonsense tasks.
arXiv Detail & Related papers (2024-05-07T20:28:34Z) - What You Use is What You Get: Unforced Errors in Studying Cultural Aspects in Agile Software Development [2.9418191027447906]
Investigating the influence of cultural characteristics is challenging due to the multi-faceted concept of culture.
Cultural and social aspects are of high importance for their successful use in practice.
arXiv Detail & Related papers (2024-04-25T20:08:37Z) - CULTURE-GEN: Revealing Global Cultural Perception in Language Models through Natural Language Prompting [73.94059188347582]
We uncover culture perceptions of three SOTA models on 110 countries and regions on 8 culture-related topics through culture-conditioned generations.
We discover that culture-conditioned generation consist of linguistic "markers" that distinguish marginalized cultures apart from default cultures.
arXiv Detail & Related papers (2024-04-16T00:50:43Z) - Massively Multi-Cultural Knowledge Acquisition & LM Benchmarking [48.21982147529661]
This paper introduces a novel approach for massively multicultural knowledge acquisition.
Our method strategically navigates from densely informative Wikipedia documents on cultural topics to an extensive network of linked pages.
Our work marks an important step towards deeper understanding and bridging the gaps of cultural disparities in AI.
arXiv Detail & Related papers (2024-02-14T18:16:54Z) - Not All Countries Celebrate Thanksgiving: On the Cultural Dominance in
Large Language Models [89.94270049334479]
This paper identifies a cultural dominance issue within large language models (LLMs)
LLMs often provide inappropriate English-culture-related answers that are not relevant to the expected culture when users ask in non-English languages.
arXiv Detail & Related papers (2023-10-19T05:38:23Z) - Cultural Alignment in Large Language Models: An Explanatory Analysis Based on Hofstede's Cultural Dimensions [10.415002561977655]
This research proposes a Cultural Alignment Test (Hoftede's CAT) to quantify cultural alignment using Hofstede's cultural dimension framework.
We quantitatively evaluate large language models (LLMs) against the cultural dimensions of regions like the United States, China, and Arab countries.
Our results quantify the cultural alignment of LLMs and reveal the difference between LLMs in explanatory cultural dimensions.
arXiv Detail & Related papers (2023-08-25T14:50:13Z) - Probing Pre-Trained Language Models for Cross-Cultural Differences in
Values [42.45033681054207]
We introduce probes to study which values across cultures are embedded in Pre-Trained Language models.
We find that PTLMs capture differences in values across cultures, but those only weakly align with established value surveys.
arXiv Detail & Related papers (2022-03-25T15:45:49Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.