Natural Language Processing in Ethiopian Languages: Current State,
Challenges, and Opportunities
- URL: http://arxiv.org/abs/2303.14406v1
- Date: Sat, 25 Mar 2023 09:04:29 GMT
- Title: Natural Language Processing in Ethiopian Languages: Current State,
Challenges, and Opportunities
- Authors: Atnafu Lambebo Tonja, Tadesse Destaw Belay, Israel Abebe Azime, Abinew
Ali Ayele, Moges Ahmed Mehamed, Olga Kolesnikova, Seid Muhie Yimam
- Abstract summary: This survey delves into the current state of natural language processing (NLP) for four Ethiopian languages: Amharic, Afaan Oromo, Tigrinya, and Wolaytta.
- Score: 3.6328558641172553
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: This survey delves into the current state of natural language processing
(NLP) for four Ethiopian languages: Amharic, Afaan Oromo, Tigrinya, and
Wolaytta. Through this paper, we identify key challenges and opportunities for
NLP research in Ethiopia. Furthermore, we provide a centralized repository on
GitHub that contains publicly available resources for various NLP tasks in
these languages. This repository can be updated periodically with contributions
from other researchers. Our objective is to identify research gaps and
disseminate the information to NLP researchers interested in Ethiopian
languages and encourage future research in this domain.
Related papers
- State of NLP in Kenya: A Survey [0.25454395163615406]
Kenya, known for its linguistic diversity, faces unique challenges and promising opportunities in advancing Natural Language Processing.
This survey provides a detailed assessment of the current state of NLP in Kenya.
The paper uncovers significant gaps by critically evaluating the available datasets and existing NLP models.
arXiv Detail & Related papers (2024-10-13T18:08:24Z) - The Nature of NLP: Analyzing Contributions in NLP Papers [77.31665252336157]
We quantitatively investigate what constitutes NLP research by examining research papers.
Our findings reveal a rising involvement of machine learning in NLP since the early nineties.
In post-2020, there has been a resurgence of focus on language and people.
arXiv Detail & Related papers (2024-09-29T01:29:28Z) - NLP for The Greek Language: A Longer Survey [1.6114012813668932]
We list and briefly discuss related works, resources and tools, categorized according to various processing layers and contexts.
This survey can be useful for researchers and students interested in NLP tasks, Information Retrieval and Knowledge Management for the Greek language.
arXiv Detail & Related papers (2024-08-20T15:57:18Z) - The Ghanaian NLP Landscape: A First Look [9.17372840572907]
Ghanaian languages, in particular, face an alarming decline, with documented extinction and several at risk.
This study pioneers a comprehensive survey of Natural Language Processing (NLP) research focused on Ghanaian languages.
arXiv Detail & Related papers (2024-05-10T21:39:09Z) - EthioMT: Parallel Corpus for Low-resource Ethiopian Languages [49.80726355048843]
We introduce EthioMT -- a new parallel corpus for 15 languages.
We also create a new benchmark by collecting a dataset for better-researched languages in Ethiopia.
We evaluate the newly collected corpus and the benchmark dataset for 23 Ethiopian languages using transformer and fine-tuning approaches.
arXiv Detail & Related papers (2024-03-28T12:26:45Z) - EthioLLM: Multilingual Large Language Models for Ethiopian Languages with Task Evaluation [24.060772057458685]
This paper introduces EthioLLM -- multilingual large language models for five Ethiopian languages (Amharic, Ge'ez, Afan Oromo, Somali, and Tigrinya) and English.
We evaluate the performance of these models across five downstream Natural Language Processing (NLP) tasks.
arXiv Detail & Related papers (2024-03-20T16:43:42Z) - Natural Language Processing for Dialects of a Language: A Survey [56.93337350526933]
State-of-the-art natural language processing (NLP) models are trained on massive training corpora, and report a superlative performance on evaluation datasets.
This survey delves into an important attribute of these datasets: the dialect of a language.
Motivated by the performance degradation of NLP models for dialectal datasets and its implications for the equity of language technologies, we survey past research in NLP for dialects in terms of datasets, and approaches.
arXiv Detail & Related papers (2024-01-11T03:04:38Z) - NusaCrowd: Open Source Initiative for Indonesian NLP Resources [104.5381571820792]
NusaCrowd is a collaborative initiative to collect and unify existing resources for Indonesian languages.
Our work strives to advance natural language processing (NLP) research for languages that are under-represented despite being widely spoken.
arXiv Detail & Related papers (2022-12-19T17:28:22Z) - One Country, 700+ Languages: NLP Challenges for Underrepresented
Languages and Dialects in Indonesia [60.87739250251769]
We provide an overview of the current state of NLP research for Indonesia's 700+ languages.
We highlight challenges in Indonesian NLP and how these affect the performance of current NLP systems.
arXiv Detail & Related papers (2022-03-24T22:07:22Z) - MasakhaNER: Named Entity Recognition for African Languages [48.34339599387944]
We create the first large publicly available high-quality dataset for named entity recognition in ten African languages.
We detail characteristics of the languages to help researchers understand the challenges that these languages pose for NER.
arXiv Detail & Related papers (2021-03-22T13:12:44Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.