Fairness in Language Models Beyond English: Gaps and Challenges
- URL: http://arxiv.org/abs/2302.12578v2
- Date: Tue, 28 Feb 2023 08:08:29 GMT
- Title: Fairness in Language Models Beyond English: Gaps and Challenges
- Authors: Krithika Ramesh, Sunayana Sitaram, Monojit Choudhury
- Abstract summary: This paper presents a survey of fairness in multilingual and non-English contexts.
It highlights the shortcomings of current research and the difficulties faced by methods designed for English.
- Score: 11.62418844341466
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: With language models becoming increasingly ubiquitous, it has become
essential to address their inequitable treatment of diverse demographic groups
and factors. Most research on evaluating and mitigating fairness harms has been
concentrated on English, while multilingual models and non-English languages
have received comparatively little attention. This paper presents a survey of
fairness in multilingual and non-English contexts, highlighting the
shortcomings of current research and the difficulties faced by methods designed
for English. We contend that the multitude of diverse cultures and languages
across the world makes it infeasible to achieve comprehensive coverage in terms
of constructing fairness datasets. Thus, the measurement and mitigation of
biases must evolve beyond the current dataset-driven practices that are
narrowly focused on specific dimensions and types of biases and, therefore,
impossible to scale across languages and cultures.
Related papers
- The Shrinking Landscape of Linguistic Diversity in the Age of Large Language Models [7.811355338367627]
We show that the widespread adoption of large language models (LLMs) as writing assistants is linked to notable declines in linguistic diversity.
We show that while the core content of texts is retained when LLMs polish and rewrite texts, not only do they homogenize writing styles, but they also alter stylistic elements in a way that selectively amplifies certain dominant characteristics or biases while suppressing others.
arXiv Detail & Related papers (2025-02-16T20:51:07Z) - From No to Know: Taxonomy, Challenges, and Opportunities for Negation Understanding in Multimodal Foundation Models [48.68342037881584]
Negation, a linguistic construct conveying absence, denial, or contradiction, poses significant challenges for multilingual multimodal foundation models.
We propose a comprehensive taxonomy of negation constructs, illustrating how structural, semantic, and cultural factors influence multimodal foundation models.
We advocate for specialized benchmarks, language-specific tokenization, fine-grained attention mechanisms, and advanced multimodal architectures.
arXiv Detail & Related papers (2025-02-10T16:55:13Z) - Scaling for Fairness? Analyzing Model Size, Data Composition, and Multilinguality in Vision-Language Bias [14.632649933582648]
We investigate how dataset composition, model size, and multilingual training affect gender and racial bias in a popular VLM, CLIP, and its open source variants.
To assess social perception bias, we measure the zero-shot performance on face images featuring socially charged terms.
arXiv Detail & Related papers (2025-01-22T21:08:30Z) - Quantifying the Dialect Gap and its Correlates Across Languages [69.18461982439031]
This work will lay the foundation for furthering the field of dialectal NLP by laying out evident disparities and identifying possible pathways for addressing them through mindful data collection.
arXiv Detail & Related papers (2023-10-23T17:42:01Z) - On Evaluating and Mitigating Gender Biases in Multilingual Settings [5.248564173595024]
We investigate some of the challenges with evaluating and mitigating biases in multilingual settings.
We first create a benchmark for evaluating gender biases in pre-trained masked language models.
We extend various debiasing methods to work beyond English and evaluate their effectiveness for SOTA massively multilingual models.
arXiv Detail & Related papers (2023-07-04T06:23:04Z) - Multi-lingual and Multi-cultural Figurative Language Understanding [69.47641938200817]
Figurative language permeates human communication, but is relatively understudied in NLP.
We create a dataset for seven diverse languages associated with a variety of cultures: Hindi, Indonesian, Javanese, Kannada, Sundanese, Swahili and Yoruba.
Our dataset reveals that each language relies on cultural and regional concepts for figurative expressions, with the highest overlap between languages originating from the same region.
All languages exhibit a significant deficiency compared to English, with variations in performance reflecting the availability of pre-training and fine-tuning data.
arXiv Detail & Related papers (2023-05-25T15:30:31Z) - Cross-lingual Lifelong Learning [53.06904052325966]
We present a principled Cross-lingual Continual Learning (CCL) evaluation paradigm.
We provide insights into what makes multilingual sequential learning particularly challenging.
The implications of this analysis include a recipe for how to measure and balance different cross-lingual continual learning desiderata.
arXiv Detail & Related papers (2022-05-23T09:25:43Z) - Language Contamination Explains the Cross-lingual Capabilities of
English Pretrained Models [79.38278330678965]
We find that common English pretraining corpora contain significant amounts of non-English text.
This leads to hundreds of millions of foreign language tokens in large-scale datasets.
We then demonstrate that even these small percentages of non-English data facilitate cross-lingual transfer for models trained on them.
arXiv Detail & Related papers (2022-04-17T23:56:54Z) - AM2iCo: Evaluating Word Meaning in Context across Low-ResourceLanguages
with Adversarial Examples [51.048234591165155]
We present AM2iCo, Adversarial and Multilingual Meaning in Context.
It aims to faithfully assess the ability of state-of-the-art (SotA) representation models to understand the identity of word meaning in cross-lingual contexts.
Results reveal that current SotA pretrained encoders substantially lag behind human performance.
arXiv Detail & Related papers (2021-04-17T20:23:45Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.