Related papers: Where Are We At with Automatic Speech Recognition for the Bambara Language?

Where Are We At with Automatic Speech Recognition for the Bambara Language?

URL: http://arxiv.org/abs/2602.09785v1
Date: Tue, 10 Feb 2026 13:44:51 GMT
Title: Where Are We At with Automatic Speech Recognition for the Bambara Language?
Authors: Seydou Diallo, Yacouba Diarra, Mamadou K. Keita, Panga Azazia Kamaté, Adam Bouno Kampo, Aboubacar Ouattara,
Abstract summary: This paper introduces the first standardized benchmark for evaluating Automatic Speech Recognition (ASR) in the Bambara language.<n>The benchmark was used to evaluate 37 models, ranging from Bambara-trained systems to large-scale commercial models.
Score: 0.7037008937757393
License: http://creativecommons.org/licenses/by/4.0/
Abstract: This paper introduces the first standardized benchmark for evaluating Automatic Speech Recognition (ASR) in the Bambara language, utilizing one hour of professionally recorded Malian constitutional text. Designed as a controlled reference set under near-optimal acoustic and linguistic conditions, the benchmark was used to evaluate 37 models, ranging from Bambara-trained systems to large-scale commercial models. Our findings reveal that current ASR performance remains significantly below deployment standards in a narrow formal domain; the top-performing system in terms of Word Error Rate (WER) achieved 46.76\% and the best Character Error Rate (CER) of 13.00\% was set by another model, while several prominent multilingual models exceeded 100\% WER. These results suggest that multilingual pre-training and model scaling alone are insufficient for underrepresented languages. Furthermore, because this dataset represents a best-case scenario of the most simplified and formal form of spoken Bambara, these figures are yet to be tested against practical, real-world settings. We provide the benchmark and an accompanying public leaderboard to facilitate transparent evaluation and future research in Bambara speech technology.

Related papers

PRiSM: Benchmarking Phone Realization in Speech Models [70.82595415252682]
Phone recognition (PR) serves as the atomic interface for language-agnostic modeling for cross-lingual speech processing and phonetic analysis.<n>We introduce PRiSM, the first open-source benchmark designed to expose blind spots in phonetic perception.
arXiv Detail & Related papers (2026-01-20T15:00:36Z)
Qalb: Largest State-of-the-Art Urdu Large Language Model for 230M Speakers with Systematic Continued Pre-training [3.950299047992185]
Urdu-a language spoken by over 230 million people-remains critically underrepresented in modern NLP systems.<n>We introduce Qalb, an Urdu language model developed through a two-stage approach: continued pre-training followed by supervised fine-tuning.<n>Our results demonstrate that continued pre-training on diverse, high-quality language data, combined with targeted instruction fine-tuning, effectively adapts foundation models to low-resource languages.
arXiv Detail & Related papers (2026-01-13T02:05:05Z)
Automatic Speech Recognition for the Ika Language [0.0]
We fine-tune the pretrained wav2vec 2.0 Massively translations Speech Models on a high-quality speech dataset compiled from New Testament Bible Multilingual in Ika. Our results show that fine-tuning multilingual pretrained models achieves a Word Error Rate (WER) of 0.5377 and Character Error Rate (CER) of 0.2651 with just over 1 hour of training data.
arXiv Detail & Related papers (2024-10-01T11:56:42Z)
Improving Multilingual ASR in the Wild Using Simple N-best Re-ranking [68.77659513993507]
We present a simple and effective N-best re-ranking approach to improve multilingual ASR accuracy. Our results show spoken language identification accuracy improvements of 8.7% and 6.1%, respectively, and word error rates which are 3.3% and 2.0% lower on these benchmarks.
arXiv Detail & Related papers (2024-09-27T03:31:32Z)
CLAIR-A: Leveraging Large Language Models to Judge Audio Captions [73.51087998971418]
evaluating machine-generated audio captions is a complex task that requires considering diverse factors.<n>We propose CLAIR-A, a simple and flexible method that leverages the zero-shot capabilities of large language models.<n>In our evaluations, CLAIR-A better predicts human judgements of quality compared to traditional metrics.
arXiv Detail & Related papers (2024-09-19T17:59:52Z)
A Novel Self-training Approach for Low-resource Speech Recognition [15.612232220719653]
We propose a self-training approach for automatic speech recognition (ASR) for low-resource settings. Our approach significantly improves word error rate, achieving a relative improvement of 14.94%. Our proposed approach reports the best results on the Common Voice Punjabi dataset.
arXiv Detail & Related papers (2023-08-10T01:02:45Z)
Disco-Bench: A Discourse-Aware Evaluation Benchmark for Language Modelling [70.23876429382969]
We propose a benchmark that can evaluate intra-sentence discourse properties across a diverse set of NLP tasks. Disco-Bench consists of 9 document-level testsets in the literature domain, which contain rich discourse phenomena. For linguistic analysis, we also design a diagnostic test suite that can examine whether the target models learn discourse knowledge.
arXiv Detail & Related papers (2023-07-16T15:18:25Z)
Robust Speech Recognition via Large-Scale Weak Supervision [69.63329359286419]
We study the capabilities of speech processing systems trained simply to predict large amounts of transcripts of audio on the internet. When scaled to 680,000 hours of multilingual and multitask supervision, the resulting models generalize well to standard benchmarks. We are releasing models and inference code to serve as a foundation for further work on robust speech processing.
arXiv Detail & Related papers (2022-12-06T18:46:04Z)
IndicSUPERB: A Speech Processing Universal Performance Benchmark for Indian languages [16.121708272597154]
We release the IndicSUPERB benchmark for speech recognition in 12 Indian languages. We train and evaluate different self-supervised models alongside a commonly used baseline benchmark. We show that language-specific fine-tuned models are more accurate than baseline on most of the tasks.
arXiv Detail & Related papers (2022-08-24T20:14:52Z)
Few-shot Learning with Multilingual Language Models [66.49496434282564]
We train multilingual autoregressive language models on a balanced corpus covering a diverse set of languages. Our largest model sets new state of the art in few-shot learning in more than 20 representative languages. We present a detailed analysis of where the model succeeds and fails, showing in particular that it enables cross-lingual in-context learning.
arXiv Detail & Related papers (2021-12-20T16:52:35Z)

This list is automatically generated from the titles and abstracts of the papers in this site.