Diversity and Language Technology: How Techno-Linguistic Bias Can Cause
Epistemic Injustice
- URL: http://arxiv.org/abs/2307.13714v1
- Date: Tue, 25 Jul 2023 16:08:27 GMT
- Title: Diversity and Language Technology: How Techno-Linguistic Bias Can Cause
Epistemic Injustice
- Authors: Paula Helm, G\'abor Bella, Gertraud Koch, Fausto Giunchiglia
- Abstract summary: We show that many attempts produce flawed solutions that adhere to a hard-wired representational preference for certain languages.
As we show through the paper, techno-linguistic bias can result in systems that can only express concepts that are part of the language and culture of dominant powers.
We argue that at the root of this problem lies a systematic tendency of technology developer communities to apply a simplistic understanding of diversity.
- Score: 4.234367850767171
- License: http://creativecommons.org/licenses/by-nc-nd/4.0/
- Abstract: It is well known that AI-based language technology -- large language models,
machine translation systems, multilingual dictionaries, and corpora -- is
currently limited to 2 to 3 percent of the world's most widely spoken and/or
financially and politically best supported languages. In response, recent
research efforts have sought to extend the reach of AI technology to
``underserved languages.'' In this paper, we show that many of these attempts
produce flawed solutions that adhere to a hard-wired representational
preference for certain languages, which we call techno-linguistic bias.
Techno-linguistic bias is distinct from the well-established phenomenon of
linguistic bias as it does not concern the languages represented but rather the
design of the technologies. As we show through the paper, techno-linguistic
bias can result in systems that can only express concepts that are part of the
language and culture of dominant powers, unable to correctly represent concepts
from other communities. We argue that at the root of this problem lies a
systematic tendency of technology developer communities to apply a simplistic
understanding of diversity which does not do justice to the more profound
differences that languages, and ultimately the communities that speak them,
embody. Drawing on the concept of epistemic injustice, we point to the broader
sociopolitical consequences of the bias we identify and show how it can lead
not only to a disregard for valuable aspects of diversity but also to an
under-representation of the needs and diverse worldviews of marginalized
language communities.
Related papers
- A Capabilities Approach to Studying Bias and Harm in Language Technologies [4.135516576952934]
We consider fairness, bias, and inclusion in Language Technologies through the lens of the Capabilities Approach.
The Capabilities Approach centers on what people are capable of achieving, given their intersectional social, political, and economic contexts.
We detail the Capabilities Approach, its relationship to multilingual and multicultural evaluation, and how the framework affords meaningful collaboration with community members in defining and measuring the harms of Language Technologies.
arXiv Detail & Related papers (2024-11-06T22:46:13Z) - Diversidade linguística e inclusão digital: desafios para uma ia brasileira [0.0]
Linguistic diversity is a human attribute which, with the advance of generative AIs, is coming under threat.
This paper examines the consequences of the variety selection bias imposed by technological applications and the vicious circle of preserving a variety that becomes dominant and standardized.
arXiv Detail & Related papers (2024-11-02T14:17:33Z) - What Do Dialect Speakers Want? A Survey of Attitudes Towards Language Technology for German Dialects [60.8361859783634]
We survey speakers of dialects and regional languages related to German.
We find that respondents are especially in favour of potential NLP tools that work with dialectal input.
arXiv Detail & Related papers (2024-02-19T09:15:28Z) - Multilingual Text-to-Image Generation Magnifies Gender Stereotypes and Prompt Engineering May Not Help You [64.74707085021858]
We show that multilingual models suffer from significant gender biases just as monolingual models do.
We propose a novel benchmark, MAGBIG, intended to foster research on gender bias in multilingual models.
Our results show that not only do models exhibit strong gender biases but they also behave differently across languages.
arXiv Detail & Related papers (2024-01-29T12:02:28Z) - Task-Agnostic Low-Rank Adapters for Unseen English Dialects [52.88554155235167]
Large Language Models (LLMs) are trained on corpora disproportionally weighted in favor of Standard American English.
By disentangling dialect-specific and cross-dialectal information, HyperLoRA improves generalization to unseen dialects in a task-agnostic fashion.
arXiv Detail & Related papers (2023-11-02T01:17:29Z) - Lexical Diversity in Kinship Across Languages and Dialects [6.80465507148218]
We introduce a method to enrich computational lexicons with content relating to linguistic diversity.
The method is verified through two large-scale case studies on kinship terminology.
arXiv Detail & Related papers (2023-08-24T19:49:30Z) - Towards Bridging the Digital Language Divide [4.234367850767171]
multilingual language processing systems often exhibit a hardwired, yet usually involuntary and hidden representational preference towards certain languages.
We show that biased technology is often the result of research and development methodologies that do not do justice to the complexity of the languages being represented.
We present a new initiative that aims at reducing linguistic bias through both technological design and methodology.
arXiv Detail & Related papers (2023-07-25T10:53:20Z) - Comparing Biases and the Impact of Multilingual Training across Multiple
Languages [70.84047257764405]
We present a bias analysis across Italian, Chinese, English, Hebrew, and Spanish on the downstream sentiment analysis task.
We adapt existing sentiment bias templates in English to Italian, Chinese, Hebrew, and Spanish for four attributes: race, religion, nationality, and gender.
Our results reveal similarities in bias expression such as favoritism of groups that are dominant in each language's culture.
arXiv Detail & Related papers (2023-05-18T18:15:07Z) - Evaluating the Diversity, Equity and Inclusion of NLP Technology: A Case
Study for Indian Languages [35.86100962711644]
In order for NLP technology to be widely applicable, fair, and useful, it needs to serve a diverse set of speakers across the world's languages.
We propose an evaluation paradigm that assesses NLP technologies across all three dimensions.
arXiv Detail & Related papers (2022-05-25T11:38:04Z) - A New Generation of Perspective API: Efficient Multilingual
Character-level Transformers [66.9176610388952]
We present the fundamentals behind the next version of the Perspective API from Google Jigsaw.
At the heart of the approach is a single multilingual token-free Charformer model.
We demonstrate that by forgoing static vocabularies, we gain flexibility across a variety of settings.
arXiv Detail & Related papers (2022-02-22T20:55:31Z) - Gender Bias in Multilingual Embeddings and Cross-Lingual Transfer [101.58431011820755]
We study gender bias in multilingual embeddings and how it affects transfer learning for NLP applications.
We create a multilingual dataset for bias analysis and propose several ways for quantifying bias in multilingual representations.
arXiv Detail & Related papers (2020-05-02T04:34:37Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.