Natural Language Processing in Ethiopian Languages: Current State,
Challenges, and Opportunities
- URL: http://arxiv.org/abs/2303.14406v1
- Date: Sat, 25 Mar 2023 09:04:29 GMT
- Title: Natural Language Processing in Ethiopian Languages: Current State,
Challenges, and Opportunities
- Authors: Atnafu Lambebo Tonja, Tadesse Destaw Belay, Israel Abebe Azime, Abinew
Ali Ayele, Moges Ahmed Mehamed, Olga Kolesnikova, Seid Muhie Yimam
- Abstract summary: This survey delves into the current state of natural language processing (NLP) for four Ethiopian languages: Amharic, Afaan Oromo, Tigrinya, and Wolaytta.
- Score: 3.6328558641172553
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: This survey delves into the current state of natural language processing
(NLP) for four Ethiopian languages: Amharic, Afaan Oromo, Tigrinya, and
Wolaytta. Through this paper, we identify key challenges and opportunities for
NLP research in Ethiopia. Furthermore, we provide a centralized repository on
GitHub that contains publicly available resources for various NLP tasks in
these languages. This repository can be updated periodically with contributions
from other researchers. Our objective is to identify research gaps and
disseminate the information to NLP researchers interested in Ethiopian
languages and encourage future research in this domain.
Related papers
- State of NLP in Kenya: A Survey [0.25454395163615406]
Kenya, known for its linguistic diversity, faces unique challenges and promising opportunities in advancing Natural Language Processing.
This survey provides a detailed assessment of the current state of NLP in Kenya.
The paper uncovers significant gaps by critically evaluating the available datasets and existing NLP models.
arXiv Detail & Related papers (2024-10-13T18:08:24Z) - The Nature of NLP: Analyzing Contributions in NLP Papers [77.31665252336157]
We quantitatively investigate what constitutes NLP research by examining research papers.
Our findings reveal a rising involvement of machine learning in NLP since the early nineties.
In post-2020, there has been a resurgence of focus on language and people.
arXiv Detail & Related papers (2024-09-29T01:29:28Z) - NLP for The Greek Language: A Longer Survey [1.6114012813668932]
We list and briefly discuss related works, resources and tools, categorized according to various processing layers and contexts.
This survey can be useful for researchers and students interested in NLP tasks, Information Retrieval and Knowledge Management for the Greek language.
arXiv Detail & Related papers (2024-08-20T15:57:18Z) - Towards Systematic Monolingual NLP Surveys: GenA of Greek NLP [2.3499129784547663]
This study fills the gap by introducing a method for creating systematic and comprehensive monolingual NLP surveys.
Characterized by a structured search protocol, it can be used to select publications and organize them through a taxonomy of NLP tasks.
By applying our method, we conducted a systematic literature review of Greek NLP from 2012 to 2022.
arXiv Detail & Related papers (2024-07-13T12:01:52Z) - The Ghanaian NLP Landscape: A First Look [9.17372840572907]
Ghanaian languages, in particular, face an alarming decline, with documented extinction and several at risk.
This study pioneers a comprehensive survey of Natural Language Processing (NLP) research focused on Ghanaian languages.
arXiv Detail & Related papers (2024-05-10T21:39:09Z) - Can a Multichoice Dataset be Repurposed for Extractive Question Answering? [52.28197971066953]
We repurposed the Belebele dataset (Bandarkar et al., 2023), which was designed for multiple-choice question answering (MCQA)
We present annotation guidelines and a parallel EQA dataset for English and Modern Standard Arabic (MSA).
Our aim is to enable others to adapt our approach for the 120+ other language variants in Belebele, many of which are deemed under-resourced.
arXiv Detail & Related papers (2024-04-26T11:46:05Z) - EthioMT: Parallel Corpus for Low-resource Ethiopian Languages [49.80726355048843]
We introduce EthioMT -- a new parallel corpus for 15 languages.
We also create a new benchmark by collecting a dataset for better-researched languages in Ethiopia.
We evaluate the newly collected corpus and the benchmark dataset for 23 Ethiopian languages using transformer and fine-tuning approaches.
arXiv Detail & Related papers (2024-03-28T12:26:45Z) - Natural Language Processing for Dialects of a Language: A Survey [56.93337350526933]
State-of-the-art natural language processing (NLP) models are trained on massive training corpora, and report a superlative performance on evaluation datasets.
This survey delves into an important attribute of these datasets: the dialect of a language.
Motivated by the performance degradation of NLP models for dialectic datasets and its implications for the equity of language technologies, we survey past research in NLP for dialects in terms of datasets, and approaches.
arXiv Detail & Related papers (2024-01-11T03:04:38Z) - One Country, 700+ Languages: NLP Challenges for Underrepresented
Languages and Dialects in Indonesia [60.87739250251769]
We provide an overview of the current state of NLP research for Indonesia's 700+ languages.
We highlight challenges in Indonesian NLP and how these affect the performance of current NLP systems.
arXiv Detail & Related papers (2022-03-24T22:07:22Z) - MasakhaNER: Named Entity Recognition for African Languages [48.34339599387944]
We create the first large publicly available high-quality dataset for named entity recognition in ten African languages.
We detail characteristics of the languages to help researchers understand the challenges that these languages pose for NER.
arXiv Detail & Related papers (2021-03-22T13:12:44Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.