Building Socio-culturally Inclusive Stereotype Resources with Community
Engagement
- URL: http://arxiv.org/abs/2307.10514v1
- Date: Thu, 20 Jul 2023 01:26:34 GMT
- Title: Building Socio-culturally Inclusive Stereotype Resources with Community
Engagement
- Authors: Sunipa Dev, Jaya Goyal, Dinesh Tewari, Shachi Dave, Vinodkumar
Prabhakaran
- Abstract summary: We demonstrate a socio-culturally aware expansion of evaluation resources in the Indian societal context, specifically for the harm of stereotyping.
The resultant resource increases the number of stereotypes known for and in the Indian context by over 1000 stereotypes across many unique identities.
- Score: 9.131536842607069
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: With rapid development and deployment of generative language models in global
settings, there is an urgent need to also scale our measurements of harm, not
just in the number and types of harms covered, but also how well they account
for local cultural contexts, including marginalized identities and the social
biases experienced by them. Current evaluation paradigms are limited in their
abilities to address this, as they are not representative of diverse, locally
situated but global, socio-cultural perspectives. It is imperative that our
evaluation resources are enhanced and calibrated by including people and
experiences from different cultures and societies worldwide, in order to
prevent gross underestimations or skews in measurements of harm. In this work,
we demonstrate a socio-culturally aware expansion of evaluation resources in
the Indian societal context, specifically for the harm of stereotyping. We
devise a community engaged effort to build a resource which contains
stereotypes for axes of disparity that are uniquely present in India. The
resultant resource increases the number of stereotypes known for and in the
Indian context by over 1000 stereotypes across many unique identities. We also
demonstrate the utility and effectiveness of such expanded resources for
evaluations of language models. CONTENT WARNING: This paper contains examples
of stereotypes that may be offensive.
Related papers
- Extrinsic Evaluation of Cultural Competence in Large Language Models [53.626808086522985]
We focus on extrinsic evaluation of cultural competence in two text generation tasks.
We evaluate model outputs when an explicit cue of culture, specifically nationality, is perturbed in the prompts.
We find weak correlations between text similarity of outputs for different countries and the cultural values of these countries.
arXiv Detail & Related papers (2024-06-17T14:03:27Z) - CULTURE-GEN: Revealing Global Cultural Perception in Language Models through Natural Language Prompting [73.94059188347582]
We uncover culture perceptions of three SOTA models on 110 countries and regions on 8 culture-related topics through culture-conditioned generations.
We discover that culture-conditioned generation consist of linguistic "markers" that distinguish marginalized cultures apart from default cultures.
arXiv Detail & Related papers (2024-04-16T00:50:43Z) - SeeGULL Multilingual: a Dataset of Geo-Culturally Situated Stereotypes [18.991295993710224]
SeeGULL is a global-scale multilingual dataset of social stereotypes, spanning 20 languages, with human annotations across 23 regions, and demonstrate its utility in identifying gaps in model evaluations.
arXiv Detail & Related papers (2024-03-08T22:09:58Z) - Massively Multi-Cultural Knowledge Acquisition & LM Benchmarking [48.21982147529661]
This paper introduces a novel approach for massively multicultural knowledge acquisition.
Our method strategically navigates from densely informative Wikipedia documents on cultural topics to an extensive network of linked pages.
Our work marks an important step towards deeper understanding and bridging the gaps of cultural disparities in AI.
arXiv Detail & Related papers (2024-02-14T18:16:54Z) - Global Voices, Local Biases: Socio-Cultural Prejudices across Languages [22.92083941222383]
Human biases are ubiquitous but not uniform; disparities exist across linguistic, cultural, and societal borders.
In this work, we scale the Word Embedding Association Test (WEAT) to 24 languages, enabling broader studies.
To encompass more widely prevalent societal biases, we examine new bias dimensions across toxicity, ableism, and more.
arXiv Detail & Related papers (2023-10-26T17:07:50Z) - Not All Countries Celebrate Thanksgiving: On the Cultural Dominance in
Large Language Models [89.94270049334479]
This paper identifies a cultural dominance issue within large language models (LLMs)
LLMs often provide inappropriate English-culture-related answers that are not relevant to the expected culture when users ask in non-English languages.
arXiv Detail & Related papers (2023-10-19T05:38:23Z) - SeeGULL: A Stereotype Benchmark with Broad Geo-Cultural Coverage
Leveraging Generative Models [15.145145928670827]
SeeGULL is a broad-coverage stereotype dataset in English.
It contains stereotypes about identity groups spanning 178 countries across 8 different geo-political regions across 6 continents.
We also include fine-grained offensiveness scores for different stereotypes and demonstrate their global disparities.
arXiv Detail & Related papers (2023-05-19T17:30:19Z) - Easily Accessible Text-to-Image Generation Amplifies Demographic
Stereotypes at Large Scale [61.555788332182395]
We investigate the potential for machine learning models to amplify dangerous and complex stereotypes.
We find a broad range of ordinary prompts produce stereotypes, including prompts simply mentioning traits, descriptors, occupations, or objects.
arXiv Detail & Related papers (2022-11-07T18:31:07Z) - Towards Understanding and Mitigating Social Biases in Language Models [107.82654101403264]
Large-scale pretrained language models (LMs) can be potentially dangerous in manifesting undesirable representational biases.
We propose steps towards mitigating social biases during text generation.
Our empirical results and human evaluation demonstrate effectiveness in mitigating bias while retaining crucial contextual information.
arXiv Detail & Related papers (2021-06-24T17:52:43Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.