Related papers: The Ghanaian NLP Landscape: A First Look

The Ghanaian NLP Landscape: A First Look

URL: http://arxiv.org/abs/2405.06818v1
Date: Fri, 10 May 2024 21:39:09 GMT
Title: The Ghanaian NLP Landscape: A First Look
Authors: Sheriff Issaka, Zhaoyi Zhang, Mihir Heda, Keyi Wang, Yinka Ajibola, Ryan DeMar, Xuefeng Du,
Abstract summary: Ghanaian languages, in particular, face an alarming decline, with documented extinction and several at risk. This study pioneers a comprehensive survey of Natural Language Processing (NLP) research focused on Ghanaian languages.
Score: 9.17372840572907
License: http://creativecommons.org/licenses/by/4.0/
Abstract: Despite comprising one-third of global languages, African languages are critically underrepresented in Artificial Intelligence (AI), threatening linguistic diversity and cultural heritage. Ghanaian languages, in particular, face an alarming decline, with documented extinction and several at risk. This study pioneers a comprehensive survey of Natural Language Processing (NLP) research focused on Ghanaian languages, identifying methodologies, datasets, and techniques employed. Additionally, we create a detailed roadmap outlining challenges, best practices, and future directions, aiming to improve accessibility for researchers. This work serves as a foundational resource for Ghanaian NLP research and underscores the critical need for integrating global linguistic diversity into AI development.

Related papers

NaijaNLP: A Survey of Nigerian Low-Resource Languages [0.0]
Three languages -- Hausa, Yorub'a and Igbo -- account for about 60% of the spoken languages in Nigeria. These languages are categorised as low-resource due to insufficient resources to support tasks in computational linguistics. This study presents the first comprehensive review of advancements in low-resource NLP (LR-NLP) research across the three major Nigerian languages.
arXiv Detail & Related papers (2025-02-27T05:48:51Z)
Bridging Gaps in Natural Language Processing for Yorùbá: A Systematic Review of a Decade of Progress and Prospects [0.6554326244334868]
This review highlights the scarcity of annotated corpora, limited availability of pre-trained language models, and linguistic challenges like tonal complexity and diacritic dependency as significant obstacles. The findings reveal a growing body of multilingual and monolingual resources, even though the field is constrained by socio-cultural factors such as code-switching and desertion of language for digital usage.
arXiv Detail & Related papers (2025-02-24T17:41:48Z)
LIMBA: An Open-Source Framework for the Preservation and Valorization of Low-Resource Languages using Generative Models [62.47865866398233]
This white paper proposes a framework to generate linguistic tools for low-resource languages. By addressing the data scarcity that hinders intelligent applications for such languages, we contribute to promoting linguistic diversity.
arXiv Detail & Related papers (2024-11-20T16:59:41Z)
State of NLP in Kenya: A Survey [0.25454395163615406]
Kenya, known for its linguistic diversity, faces unique challenges and promising opportunities in advancing Natural Language Processing. This survey provides a detailed assessment of the current state of NLP in Kenya. The paper uncovers significant gaps by critically evaluating the available datasets and existing NLP models.
arXiv Detail & Related papers (2024-10-13T18:08:24Z)
Towards Systematic Monolingual NLP Surveys: GenA of Greek NLP [2.3499129784547663]
This study fills the gap by introducing a method for creating systematic and comprehensive monolingual NLP surveys. Characterized by a structured search protocol, it can be used to select publications and organize them through a taxonomy of NLP tasks. By applying our method, we conducted a systematic literature review of Greek NLP from 2012 to 2022.
arXiv Detail & Related papers (2024-07-13T12:01:52Z)
A Survey on Large Language Models with Multilingualism: Recent Advances and New Frontiers [51.8203871494146]
The rapid development of Large Language Models (LLMs) demonstrates remarkable multilingual capabilities in natural language processing. Despite the breakthroughs of LLMs, the investigation into the multilingual scenario remains insufficient. This survey aims to help the research community address multilingual problems and provide a comprehensive understanding of the core concepts, key techniques, and latest developments in multilingual natural language processing based on LLMs.
arXiv Detail & Related papers (2024-05-17T17:47:39Z)
From Bytes to Borsch: Fine-Tuning Gemma and Mistral for the Ukrainian Language Representation [0.0]
generative large language models (LLMs) stand at the forefront of innovation, showcasing unparalleled abilities in text understanding and generation. However, the limited representation of low-resource languages like Ukrainian poses a notable challenge, restricting the reach and relevance of this technology. Our paper addresses this by fine-tuning the open-source Gemma and Mistral LLMs with Ukrainian datasets, aiming to improve their linguistic proficiency.
arXiv Detail & Related papers (2024-04-14T04:25:41Z)
Natural Language Processing for Dialects of a Language: A Survey [56.93337350526933]
State-of-the-art natural language processing (NLP) models are trained on massive training corpora, and report a superlative performance on evaluation datasets. This survey delves into an important attribute of these datasets: the dialect of a language. Motivated by the performance degradation of NLP models for dialectic datasets and its implications for the equity of language technologies, we survey past research in NLP for dialects in terms of datasets, and approaches.
arXiv Detail & Related papers (2024-01-11T03:04:38Z)
MasakhaNER 2.0: Africa-centric Transfer Learning for Named Entity Recognition [55.95128479289923]
African languages are spoken by over a billion people, but are underrepresented in NLP research and development. We create the largest human-annotated NER dataset for 20 African languages. We show that choosing the best transfer language improves zero-shot F1 scores by an average of 14 points.
arXiv Detail & Related papers (2022-10-22T08:53:14Z)
Systematic Inequalities in Language Technology Performance across the World's Languages [94.65681336393425]
We introduce a framework for estimating the global utility of language technologies. Our analyses involve the field at large, but also more in-depth studies on both user-facing technologies and more linguistic NLP tasks.
arXiv Detail & Related papers (2021-10-13T14:03:07Z)
Ensuring the Inclusive Use of Natural Language Processing in the Global Response to COVID-19 [58.720142291102135]
We discuss ways in which current and future NLP approaches can be made more inclusive by covering low-resource languages. We suggest several future directions for researchers interested in maximizing the positive societal impacts of NLP.
arXiv Detail & Related papers (2021-08-11T12:54:26Z)
MasakhaNER: Named Entity Recognition for African Languages [48.34339599387944]
We create the first large publicly available high-quality dataset for named entity recognition in ten African languages. We detail characteristics of the languages to help researchers understand the challenges that these languages pose for NER.
arXiv Detail & Related papers (2021-03-22T13:12:44Z)
OkwuGb\'e: End-to-End Speech Recognition for Fon and Igbo [0.015863809575305417]
We present a state-of-art ASR model for Fon, as well as benchmark ASR model results for Igbo. We conduct a comprehensive linguistic analysis of each language and describe the creation of end-to-end, deep neural network-based speech recognition models for both languages.
arXiv Detail & Related papers (2021-03-13T18:02:44Z)

This list is automatically generated from the titles and abstracts of the papers in this site.