Linguistically Informed Evaluation of Multilingual ASR for African Languages
- URL: http://arxiv.org/abs/2602.04716v1
- Date: Wed, 04 Feb 2026 16:28:04 GMT
- Title: Linguistically Informed Evaluation of Multilingual ASR for African Languages
- Authors: Fei-Yueh Chen, Lateef Adeleke, C. M. Downey,
- Abstract summary: We evaluate three speech encoders on two African languages by complementing WER with Feature Error Rate (FER), and FER, and add a tone-aware extension (TER)<n>We show that FER and TER reveal linguistically-salient error patterns even when word-level accuracy remains low.
- Score: 0.7155139483398897
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Word Error Rate (WER) mischaracterizes ASR models' performance for African languages by combining phonological, tone, and other linguistic errors into a single lexical error. By contrast, Feature Error Rate (FER) has recently attracted attention as a viable metric that reveals linguistically meaningful errors in models' performance. In this paper, we evaluate three speech encoders on two African languages by complementing WER with CER, and FER, and add a tone-aware extension (TER). We show that by computing errors on phonological features, FER and TER reveal linguistically-salient error patterns even when word-level accuracy remains low. Our results reveal that models perform better on segmental features, while tones (especially mid and downstep) remain the most challenging features. Results on Yoruba show a striking differential in metrics, with WER=0.788, CER=0.305, and FER=0.151. Similarly for Uneme (an endangered language absent from pretraining data) a model with near-total WER and 0.461 CER achieves the relatively low FER of 0.267. This indicates model error is often attributable to individual phonetic feature errors, which is obscured by all-or-nothing metrics like WER.
Related papers
- A Taxonomy of Errors in English as she is spoke: Toward an AI-Based Method of Error Analysis for EFL Writing Instruction [0.0]
This study describes the development of an AI-assisted error analysis system designed to identify, categorize, and correct writing errors in English.<n>The system employs a detailed taxonomy grounded in linguistic theories from Corder (1967), Richards (1971), and James (1998).<n>The AI successfully identified diverse error types but showed limitations in contextual understanding and occasionally generated new error categories when encountering uncoded errors.
arXiv Detail & Related papers (2025-11-29T08:45:00Z) - Automatic Correction of Writing Anomalies in Hausa Texts [0.0]
Hausa texts are often characterized by writing anomalies such as incorrect character substitutions and spacing errors.<n>This paper presents an approach to automatically correct the anomalies by finetuning transformer-based models.
arXiv Detail & Related papers (2025-06-04T10:46:19Z) - Error Classification of Large Language Models on Math Word Problems: A Dynamically Adaptive Framework [79.40678802098026]
Math Word Problems serve as a crucial benchmark for evaluating Large Language Models' reasoning abilities.<n>Current error classification methods rely on static and predefined categories.<n>We propose Error-Aware Prompting (EAP) that incorporates common error patterns as explicit guidance.
arXiv Detail & Related papers (2025-01-26T16:17:57Z) - Improving Multilingual ASR in the Wild Using Simple N-best Re-ranking [68.77659513993507]
We present a simple and effective N-best re-ranking approach to improve multilingual ASR accuracy.
Our results show spoken language identification accuracy improvements of 8.7% and 6.1%, respectively, and word error rates which are 3.3% and 2.0% lower on these benchmarks.
arXiv Detail & Related papers (2024-09-27T03:31:32Z) - Assessing the Efficacy of Grammar Error Correction: A Human Evaluation
Approach in the Japanese Context [10.047123247001714]
We evaluate the performance of the state-of-the-art sequence tagging grammar error detection and correction model (SeqTagger)
With an automatic annotation toolkit, ERRANT, we first evaluated SeqTagger's performance on error correction with human expert correction as the benchmark.
Results indicated a precision of 63.66% and a recall of 20.19% for error correction in the full dataset.
arXiv Detail & Related papers (2024-02-28T06:43:43Z) - Machine Translation Meta Evaluation through Translation Accuracy
Challenge Sets [92.38654521870444]
We introduce ACES, a contrastive challenge set spanning 146 language pairs.
This dataset aims to discover whether metrics can identify 68 translation accuracy errors.
We conduct a large-scale study by benchmarking ACES on 50 metrics submitted to the WMT 2022 and 2023 metrics shared tasks.
arXiv Detail & Related papers (2024-01-29T17:17:42Z) - Understanding and Mitigating Classification Errors Through Interpretable
Token Patterns [58.91023283103762]
Characterizing errors in easily interpretable terms gives insight into whether a classifier is prone to making systematic errors.
We propose to discover those patterns of tokens that distinguish correct and erroneous predictions.
We show that our method, Premise, performs well in practice.
arXiv Detail & Related papers (2023-11-18T00:24:26Z) - Unlikelihood Tuning on Negative Samples Amazingly Improves Zero-Shot
Translation [79.96416609433724]
Zero-shot translation (ZST) aims to translate between unseen language pairs in training data.
The common practice to guide the zero-shot language mapping during inference is to deliberately insert the source and target language IDs.
Recent studies have shown that language IDs sometimes fail to navigate the ZST task, making them suffer from the off-target problem.
arXiv Detail & Related papers (2023-09-28T17:02:36Z) - Boosting Chinese ASR Error Correction with Dynamic Error Scaling
Mechanism [27.09416337926635]
Current mainstream models often struggle with effectively utilizing word-level features and phonetic information.
This paper introduces a novel approach that incorporates a dynamic error scaling mechanism to detect and correct phonetically erroneous text.
arXiv Detail & Related papers (2023-08-07T09:19:59Z) - On the Robustness of Language Encoders against Grammatical Errors [66.05648604987479]
We collect real grammatical errors from non-native speakers and conduct adversarial attacks to simulate these errors on clean text data.
Results confirm that the performance of all tested models is affected but the degree of impact varies.
arXiv Detail & Related papers (2020-05-12T11:01:44Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.