Related papers: Voice of a Continent: Mapping Africa's Speech Technology Frontier

Voice of a Continent: Mapping Africa's Speech Technology Frontier

URL: http://arxiv.org/abs/2505.18436v3
Date: Fri, 04 Jul 2025 23:16:29 GMT
Title: Voice of a Continent: Mapping Africa's Speech Technology Frontier
Authors: AbdelRahim Elmadany, Sang Yun Kwon, Hawau Olamide Toyin, Alcides Alcoba Inciarte, Hanan Aldarmaki, Muhammad Abdul-Mageed,
Abstract summary: Africa's rich linguistic diversity remains significantly underrepresented in speech technologies.<n>We introduce the Simba family of models, achieving state-of-the-art performance across multiple African languages and speech tasks.<n>Our work highlights the need for expanded speech technology resources that better reflect Africa's linguistic diversity.
Score: 14.063189144905074
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Africa's rich linguistic diversity remains significantly underrepresented in speech technologies, creating barriers to digital inclusion. To alleviate this challenge, we systematically map the continent's speech space of datasets and technologies, leading to a new comprehensive benchmark SimbaBench for downstream African speech tasks. Using SimbaBench, we introduce the Simba family of models, achieving state-of-the-art performance across multiple African languages and speech tasks. Our benchmark analysis reveals critical patterns in resource availability, while our model evaluation demonstrates how dataset quality, domain diversity, and language family relationships influence performance across languages. Our work highlights the need for expanded speech technology resources that better reflect Africa's linguistic diversity and provides a solid foundation for future research and development efforts toward more inclusive speech technologies.

Related papers

AfriSpeech-MultiBench: A Verticalized Multidomain Multicountry Benchmark Suite for African Accented English ASR [2.6822781046552824]
AfriSpeech-MultiBench is the first domain-specific evaluation suite for over 100 African English accents across 10+ countries.<n>We benchmark a diverse range of open, closed, unimodal ASR and multimodal LLM-based speech recognition systems.<n>Our empirical analysis reveals systematic variation: open-source ASR models excels in spontaneous speech contexts but degrades on noisy, non-native dialogue.<n> proprietary models deliver high accuracy on clean speech but vary significantly by country and domain.
arXiv Detail & Related papers (2025-11-18T08:44:17Z)
The NaijaVoices Dataset: Cultivating Large-Scale, High-Quality, Culturally-Rich Speech Data for African Languages [10.225163354933372]
We introduce the NaijaVoices dataset, a 1,800-hour speech-text dataset with 5,000+ speakers.<n>We outline our unique data collection approach, analyze its acoustic diversity, and demonstrate its impact through finetuning experiments.<n>These results highlight NaijaVoices' potential to advance multilingual speech processing for African languages.
arXiv Detail & Related papers (2025-05-26T22:53:48Z)
Automatic Speech Recognition for African Low-Resource Languages: Challenges and Future Directions [4.524096445909663]
Low-resource languages in Africa remain significantly underrepresented in both research and practical applications.<n>This study investigates the major challenges hindering the development of ASR systems for these languages.
arXiv Detail & Related papers (2025-05-16T20:57:39Z)
Where Are We? Evaluating LLM Performance on African Languages [16.206469767073155]
Africa's rich linguistic heritage remains underrepresented in NLP.<n>This paper integrates theoretical insights on Africa's language landscape with an empirical evaluation using Sahara.
arXiv Detail & Related papers (2025-02-26T21:49:54Z)
LIMBA: An Open-Source Framework for the Preservation and Valorization of Low-Resource Languages using Generative Models [62.47865866398233]
This white paper proposes a framework to generate linguistic tools for low-resource languages. By addressing the data scarcity that hinders intelligent applications for such languages, we contribute to promoting linguistic diversity.
arXiv Detail & Related papers (2024-11-20T16:59:41Z)
Voices Unheard: NLP Resources and Models for Yorùbá Regional Dialects [72.18753241750964]
Yorub'a is an African language with roughly 47 million speakers. Recent efforts to develop NLP technologies for African languages have focused on their standard dialects. We take steps towards bridging this gap by introducing a new high-quality parallel text and speech corpus.
arXiv Detail & Related papers (2024-06-27T22:38:04Z)
AfroBench: How Good are Large Language Models on African Languages? [55.35674466745322]
AfroBench is a benchmark for evaluating the performance of LLMs across 64 African languages.<n>AfroBench consists of nine natural language understanding datasets, six text generation datasets, six knowledge and question answering tasks, and one mathematical reasoning task.
arXiv Detail & Related papers (2023-11-14T08:10:14Z)
MasakhaNER 2.0: Africa-centric Transfer Learning for Named Entity Recognition [55.95128479289923]
African languages are spoken by over a billion people, but are underrepresented in NLP research and development. We create the largest human-annotated NER dataset for 20 African languages. We show that choosing the best transfer language improves zero-shot F1 scores by an average of 14 points.
arXiv Detail & Related papers (2022-10-22T08:53:14Z)
Building African Voices [125.92214914982753]
This paper focuses on speech synthesis for low-resourced African languages. We create a set of general-purpose instructions on building speech synthesis systems with minimum technological resources. We release the speech data, code, and trained voices for 12 African languages to support researchers and developers.
arXiv Detail & Related papers (2022-07-01T23:28:16Z)
MasakhaNER: Named Entity Recognition for African Languages [48.34339599387944]
We create the first large publicly available high-quality dataset for named entity recognition in ten African languages. We detail characteristics of the languages to help researchers understand the challenges that these languages pose for NER.
arXiv Detail & Related papers (2021-03-22T13:12:44Z)
OkwuGb\'e: End-to-End Speech Recognition for Fon and Igbo [0.015863809575305417]
We present a state-of-art ASR model for Fon, as well as benchmark ASR model results for Igbo. We conduct a comprehensive linguistic analysis of each language and describe the creation of end-to-end, deep neural network-based speech recognition models for both languages.
arXiv Detail & Related papers (2021-03-13T18:02:44Z)

This list is automatically generated from the titles and abstracts of the papers in this site.