Systematic Inequalities in Language Technology Performance across the
World's Languages
- URL: http://arxiv.org/abs/2110.06733v1
- Date: Wed, 13 Oct 2021 14:03:07 GMT
- Title: Systematic Inequalities in Language Technology Performance across the
World's Languages
- Authors: Dami\'an Blasi, Antonios Anastasopoulos, Graham Neubig
- Abstract summary: We introduce a framework for estimating the global utility of language technologies.
Our analyses involve the field at large, but also more in-depth studies on both user-facing technologies and more linguistic NLP tasks.
- Score: 94.65681336393425
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Natural language processing (NLP) systems have become a central technology in
communication, education, medicine, artificial intelligence, and many other
domains of research and development. While the performance of NLP methods has
grown enormously over the last decade, this progress has been restricted to a
minuscule subset of the world's 6,500 languages. We introduce a framework for
estimating the global utility of language technologies as revealed in a
comprehensive snapshot of recent publications in NLP. Our analyses involve the
field at large, but also more in-depth studies on both user-facing technologies
(machine translation, language understanding, question answering,
text-to-speech synthesis) as well as more linguistic NLP tasks (dependency
parsing, morphological inflection). In the process, we (1) quantify disparities
in the current state of NLP research, (2) explore some of its associated
societal and academic factors, and (3) produce tailored recommendations for
evidence-based policy making aimed at promoting more global and equitable
language technologies.
Related papers
- Responsible Multilingual Large Language Models: A Survey of Development, Applications, and Societal Impact [5.803667039914564]
This work bridges the gap by providing an end-to-end framework for developing and deploying MLLMs in production environments.
Our findings reveal critical challenges in supporting linguistic diversity, with 88.38% of world languages categorized as low-resource.
This survey provides essential guidance for practitioners and researchers working to develop more inclusive and effective multilingual AI systems.
arXiv Detail & Related papers (2024-10-23T03:19:15Z) - Layers of technology in pluriversal design. Decolonising language technology with the LiveLanguage initiative [9.063726739562227]
This paper uses LiveLanguage, a lexical database, as an example to discuss and close the gap from pluriversal design theory to practice.
The paper presents a model comprising of five layers of technological activity.
arXiv Detail & Related papers (2024-05-02T23:52:39Z) - Natural Language Processing for Dialects of a Language: A Survey [56.93337350526933]
State-of-the-art natural language processing (NLP) models are trained on massive training corpora, and report a superlative performance on evaluation datasets.
This survey delves into an important attribute of these datasets: the dialect of a language.
Motivated by the performance degradation of NLP models for dialectic datasets and its implications for the equity of language technologies, we survey past research in NLP for dialects in terms of datasets, and approaches.
arXiv Detail & Related papers (2024-01-11T03:04:38Z) - Ling-CL: Understanding NLP Models through Linguistic Curricula [17.44112549879293]
We employ a characterization of linguistic complexity from psycholinguistic and language acquisition research.
We develop data-driven curricula to understand the underlying linguistic knowledge that models learn to address NLP tasks.
arXiv Detail & Related papers (2023-10-31T01:44:33Z) - GlobalBench: A Benchmark for Global Progress in Natural Language
Processing [114.24519009839142]
GlobalBench aims to track progress on all NLP datasets in all languages.
Tracks estimated per-speaker utility and equity of technology across all languages.
Currently, GlobalBench covers 966 datasets in 190 languages, and has 1,128 system submissions spanning 62 languages.
arXiv Detail & Related papers (2023-05-24T04:36:32Z) - Evaluating the Diversity, Equity and Inclusion of NLP Technology: A Case
Study for Indian Languages [35.86100962711644]
In order for NLP technology to be widely applicable, fair, and useful, it needs to serve a diverse set of speakers across the world's languages.
We propose an evaluation paradigm that assesses NLP technologies across all three dimensions.
arXiv Detail & Related papers (2022-05-25T11:38:04Z) - Meta Learning for Natural Language Processing: A Survey [88.58260839196019]
Deep learning has been the mainstream technique in natural language processing (NLP) area.
Deep learning requires many labeled data and is less generalizable across domains.
Meta-learning is an arising field in machine learning studying approaches to learn better algorithms.
arXiv Detail & Related papers (2022-05-03T13:58:38Z) - Expanding Pretrained Models to Thousands More Languages via
Lexicon-based Adaptation [133.7313847857935]
Our study highlights how NLP methods can be adapted to thousands more languages that are under-served by current technology.
For 19 under-represented languages across 3 tasks, our methods lead to consistent improvements of up to 5 and 15 points with and without extra monolingual text respectively.
arXiv Detail & Related papers (2022-03-17T16:48:22Z) - Ensuring the Inclusive Use of Natural Language Processing in the Global
Response to COVID-19 [58.720142291102135]
We discuss ways in which current and future NLP approaches can be made more inclusive by covering low-resource languages.
We suggest several future directions for researchers interested in maximizing the positive societal impacts of NLP.
arXiv Detail & Related papers (2021-08-11T12:54:26Z) - The State and Fate of Linguistic Diversity and Inclusion in the NLP
World [12.936270946393483]
Language technologies contribute to promoting multilingualism and linguistic diversity around the world.
Only a very small number of the over 7000 languages of the world are represented in the rapidly evolving language technologies and applications.
arXiv Detail & Related papers (2020-04-20T07:19:22Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.