Ethical Considerations for Machine Translation of Indigenous Languages:
Giving a Voice to the Speakers
- URL: http://arxiv.org/abs/2305.19474v1
- Date: Wed, 31 May 2023 01:04:20 GMT
- Title: Ethical Considerations for Machine Translation of Indigenous Languages:
Giving a Voice to the Speakers
- Authors: Manuel Mager, Elisabeth Mager, Katharina Kann, Ngoc Thang Vu
- Abstract summary: Machine translation has become very successful for high-resource language pairs.
This has sparked new interest in research on the automatic translation of low-resource languages, including Indigenous languages.
- Score: 40.84344504873471
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: In recent years machine translation has become very successful for
high-resource language pairs. This has also sparked new interest in research on
the automatic translation of low-resource languages, including Indigenous
languages. However, the latter are deeply related to the ethnic and cultural
groups that speak (or used to speak) them. The data collection, modeling and
deploying machine translation systems thus result in new ethical questions that
must be addressed. Motivated by this, we first survey the existing literature
on ethical considerations for the documentation, translation, and general
natural language processing for Indigenous languages. Afterward, we conduct and
analyze an interview study to shed light on the positions of community leaders,
teachers, and language activists regarding ethical concerns for the automatic
translation of their languages. Our results show that the inclusion, at
different degrees, of native speakers and community members is vital to
performing better and more ethical research on Indigenous languages.
Related papers
- A global AI community requires language-diverse publishing [1.4579344926652844]
We argue that the requirement for English language publishing upholds and reinforces broader regimes of extraction in AI.
We propose alternative futures for a healthier publishing culture, organized around three themes.
arXiv Detail & Related papers (2024-08-27T04:20:10Z) - Harnessing the Power of Artificial Intelligence to Vitalize Endangered Indigenous Languages: Technologies and Experiences [31.62071644137294]
We discuss the decreasing diversity of languages in the world and how working with Indigenous languages poses unique ethical challenges for AI and NLP.
We report encouraging results in the development of high-quality machine learning translators for Indigenous languages.
We present prototypes we have built in projects done in 2023 and 2024 with Indigenous communities in Brazil, aimed at facilitating writing.
arXiv Detail & Related papers (2024-07-17T14:46:37Z) - What Do Dialect Speakers Want? A Survey of Attitudes Towards Language Technology for German Dialects [60.8361859783634]
We survey speakers of dialects and regional languages related to German.
We find that respondents are especially in favour of potential NLP tools that work with dialectal input.
arXiv Detail & Related papers (2024-02-19T09:15:28Z) - "It's how you do things that matters": Attending to Process to Better
Serve Indigenous Communities with Language Technologies [2.821682550792172]
This position paper explores ethical considerations in building NLP technologies for Indigenous languages.
We report on interviews with 17 researchers working in or with Aboriginal and/or Torres Strait Islander communities.
We recommend practices for NLP researchers to increase attention to the process of engagements with Indigenous communities.
arXiv Detail & Related papers (2024-02-04T23:23:51Z) - Towards Bridging the Digital Language Divide [4.234367850767171]
multilingual language processing systems often exhibit a hardwired, yet usually involuntary and hidden representational preference towards certain languages.
We show that biased technology is often the result of research and development methodologies that do not do justice to the complexity of the languages being represented.
We present a new initiative that aims at reducing linguistic bias through both technological design and methodology.
arXiv Detail & Related papers (2023-07-25T10:53:20Z) - Neural Machine Translation for the Indigenous Languages of the Americas:
An Introduction [102.13536517783837]
Most languages from the Americas are among them, having a limited amount of parallel and monolingual data, if any.
We discuss the recent advances and findings and open questions, product of an increased interest of the NLP community in these languages.
arXiv Detail & Related papers (2023-06-11T23:27:47Z) - No Language Left Behind: Scaling Human-Centered Machine Translation [69.28110770760506]
We create datasets and models aimed at narrowing the performance gap between low and high-resource languages.
We propose multiple architectural and training improvements to counteract overfitting while training on thousands of tasks.
Our model achieves an improvement of 44% BLEU relative to the previous state-of-the-art.
arXiv Detail & Related papers (2022-07-11T07:33:36Z) - Not always about you: Prioritizing community needs when developing
endangered language technology [5.670857685983896]
We discuss the unique technological, cultural, practical, and ethical challenges that researchers and indigenous speech community members face.
We report the perspectives of language teachers, Master Speakers and elders from indigenous communities, as well as the point of view of academics.
arXiv Detail & Related papers (2022-04-12T05:59:39Z) - Ethical-Advice Taker: Do Language Models Understand Natural Language
Interventions? [62.74872383104381]
We investigate the effectiveness of natural language interventions for reading-comprehension systems.
We propose a new language understanding task, Linguistic Ethical Interventions (LEI), where the goal is to amend a question-answering (QA) model's unethical behavior.
arXiv Detail & Related papers (2021-06-02T20:57:58Z) - Bridging Linguistic Typology and Multilingual Machine Translation with
Multi-View Language Representations [83.27475281544868]
We use singular vector canonical correlation analysis to study what kind of information is induced from each source.
We observe that our representations embed typology and strengthen correlations with language relationships.
We then take advantage of our multi-view language vector space for multilingual machine translation, where we achieve competitive overall translation accuracy.
arXiv Detail & Related papers (2020-04-30T16:25:39Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.