Towards Afrocentric NLP for African Languages: Where We Are and Where We
Can Go
- URL: http://arxiv.org/abs/2203.08351v2
- Date: Thu, 17 Mar 2022 20:36:13 GMT
- Title: Towards Afrocentric NLP for African Languages: Where We Are and Where We
Can Go
- Authors: Ife Adebara and Muhammad Abdul-Mageed
- Abstract summary: Situating African languages in a typological framework, we discuss how the particulars of these languages can be harnessed.
Our main objective is to motivate and advocate for an Afrocentric approach to technology development.
- Score: 7.893831644671974
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Aligning with ACL 2022 special Theme on "Language Diversity: from Low
Resource to Endangered Languages", we discuss the major linguistic and
sociopolitical challenges facing development of NLP technologies for African
languages. Situating African languages in a typological framework, we discuss
how the particulars of these languages can be harnessed. To facilitate future
research, we also highlight current efforts, communities, venues, datasets, and
tools. Our main objective is to motivate and advocate for an Afrocentric
approach to technology development. With this in mind, we recommend
\textit{what} technologies to build and \textit{how} to build, evaluate, and
deploy them based on the needs of local African communities.
Related papers
- Voices Unheard: NLP Resources and Models for Yorùbá Regional Dialects [72.18753241750964]
Yorub'a is an African language with roughly 47 million speakers.
Recent efforts to develop NLP technologies for African languages have focused on their standard dialects.
We take steps towards bridging this gap by introducing a new high-quality parallel text and speech corpus.
arXiv Detail & Related papers (2024-06-27T22:38:04Z) - The Ghanaian NLP Landscape: A First Look [9.17372840572907]
Ghanaian languages, in particular, face an alarming decline, with documented extinction and several at risk.
This study pioneers a comprehensive survey of Natural Language Processing (NLP) research focused on Ghanaian languages.
arXiv Detail & Related papers (2024-05-10T21:39:09Z) - MasakhaNER 2.0: Africa-centric Transfer Learning for Named Entity
Recognition [55.95128479289923]
African languages are spoken by over a billion people, but are underrepresented in NLP research and development.
We create the largest human-annotated NER dataset for 20 African languages.
We show that choosing the best transfer language improves zero-shot F1 scores by an average of 14 points.
arXiv Detail & Related papers (2022-10-22T08:53:14Z) - Building African Voices [125.92214914982753]
This paper focuses on speech synthesis for low-resourced African languages.
We create a set of general-purpose instructions on building speech synthesis systems with minimum technological resources.
We release the speech data, code, and trained voices for 12 African languages to support researchers and developers.
arXiv Detail & Related papers (2022-07-01T23:28:16Z) - NusaX: Multilingual Parallel Sentiment Dataset for 10 Indonesian Local
Languages [100.59889279607432]
We focus on developing resources for languages in Indonesia.
Most languages in Indonesia are categorized as endangered and some are even extinct.
We develop the first-ever parallel resource for 10 low-resource languages in Indonesia.
arXiv Detail & Related papers (2022-05-31T17:03:50Z) - How can NLP Help Revitalize Endangered Languages? A Case Study and
Roadmap for the Cherokee Language [91.79339725967073]
More than 43% of the languages spoken in the world are endangered.
In this work, we focus on discussing how NLP can help revitalize endangered languages.
We take Cherokee, a severely-endangered Native American language, as a case study.
arXiv Detail & Related papers (2022-04-25T18:25:57Z) - MasakhaNER: Named Entity Recognition for African Languages [48.34339599387944]
We create the first large publicly available high-quality dataset for named entity recognition in ten African languages.
We detail characteristics of the languages to help researchers understand the challenges that these languages pose for NER.
arXiv Detail & Related papers (2021-03-22T13:12:44Z) - Lanfrica: A Participatory Approach to Documenting Machine Translation
Research on African Languages [0.012691047660244334]
Africa has the highest language diversity, with 1500-2000 documented languages and many more undocumented or extinct languages.
This makes it hard to keep track of the MT research, models and dataset that have been developed for some of them.
Online platforms can be useful creating accessibility to researches, benchmarks and datasets in these African languages.
arXiv Detail & Related papers (2020-08-03T18:14:04Z) - AI4D -- African Language Dataset Challenge [1.4922337373437886]
This work details the organisation of the AI4D - African Language dataset Challenge.
It is an effort to incentivize the creation, organization and discovery of African language datasets.
We particularly encouraged the submission of annotated datasets which can be used for training task-specific supervised machine learning models.
arXiv Detail & Related papers (2020-07-23T08:48:06Z) - Masakhane -- Machine Translation For Africa [16.66010516114378]
MASAKHANE is an open-source, continent-wide, distributed, online research effort for machine translation for African languages.
We discuss our methodology for building the community and spurring research from the African continent.
arXiv Detail & Related papers (2020-03-13T09:01:02Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.