Semantically Corrected Amharic Automatic Speech Recognition
- URL: http://arxiv.org/abs/2404.13362v1
- Date: Sat, 20 Apr 2024 12:08:00 GMT
- Title: Semantically Corrected Amharic Automatic Speech Recognition
- Authors: Samuael Adnew, Paul Pu Liang,
- Abstract summary: We build a set of ASR tools for Amharic, a language spoken by more than 50 million people in eastern Africa.
We release corrected transcriptions of existing Amharic ASR test datasets, enabling the community to accurately evaluate progress.
We introduce a post-processing approach using a transformer encoder-decoder architecture to organize raw ASR outputs into a grammatically complete and semantically meaningful Amharic sentence.
- Score: 27.569469583183423
- License: http://creativecommons.org/licenses/by-nc-sa/4.0/
- Abstract: Automatic Speech Recognition (ASR) can play a crucial role in enhancing the accessibility of spoken languages worldwide. In this paper, we build a set of ASR tools for Amharic, a language spoken by more than 50 million people primarily in eastern Africa. Amharic is written in the Ge'ez script, a sequence of graphemes with spacings denoting word boundaries. This makes computational processing of Amharic challenging since the location of spacings can significantly impact the meaning of formed sentences. We find that existing benchmarks for Amharic ASR do not account for these spacings and only measure individual grapheme error rates, leading to significantly inflated measurements of in-the-wild performance. In this paper, we first release corrected transcriptions of existing Amharic ASR test datasets, enabling the community to accurately evaluate progress. Furthermore, we introduce a post-processing approach using a transformer encoder-decoder architecture to organize raw ASR outputs into a grammatically complete and semantically meaningful Amharic sentence. Through experiments on the corrected test dataset, our model enhances the semantic correctness of Amharic speech recognition systems, achieving a Character Error Rate (CER) of 5.5\% and a Word Error Rate (WER) of 23.3\%.
Related papers
- Accented Speech Recognition With Accent-specific Codebooks [53.288874858671576]
Speech accents pose a significant challenge to state-of-the-art automatic speech recognition (ASR) systems.
Degradation in performance across underrepresented accents is a severe deterrent to the inclusive adoption of ASR.
We propose a novel accent adaptation approach for end-to-end ASR systems using cross-attention with a trainable set of codebooks.
arXiv Detail & Related papers (2023-10-24T16:10:58Z) - Adversarial Training For Low-Resource Disfluency Correction [50.51901599433536]
We propose an adversarially-trained sequence-tagging model for Disfluency Correction (DC)
We show the benefit of our proposed technique, which crucially depends on synthetically generated disfluent data, by evaluating it for DC in three Indian languages.
Our technique also performs well in removing stuttering disfluencies in ASR transcripts introduced by speech impairments.
arXiv Detail & Related papers (2023-06-10T08:58:53Z) - Investigating the Sensitivity of Automatic Speech Recognition Systems to
Phonetic Variation in L2 Englishes [3.198144010381572]
This work demonstrates a method of probing an ASR system to discover how it handles phonetic variation across a number of L2 Englishes.
It is demonstrated that the behaviour of the ASR is systematic and consistent across speakers with similar spoken varieties.
arXiv Detail & Related papers (2023-05-12T11:29:13Z) - BLASER: A Text-Free Speech-to-Speech Translation Evaluation Metric [66.73705349465207]
End-to-end speech-to-speech translation (S2ST) is generally evaluated with text-based metrics.
We propose a text-free evaluation metric for end-to-end S2ST, named BLASER, to avoid the dependency on ASR systems.
arXiv Detail & Related papers (2022-12-16T14:00:26Z) - Comprehensive Benchmark Datasets for Amharic Scene Text Detection and
Recognition [56.048783994698425]
Ethiopic/Amharic script is one of the oldest African writing systems, which serves at least 23 languages in East Africa.
The Amharic writing system, Abugida, has 282 syllables, 15 punctuation marks, and 20 numerals.
We presented the first comprehensive public datasets named HUST-ART, HUST-AST, ABE, and Tana for Amharic script detection and recognition in the natural scene.
arXiv Detail & Related papers (2022-03-23T03:19:35Z) - Advanced Long-context End-to-end Speech Recognition Using
Context-expanded Transformers [56.56220390953412]
We extend our prior work by introducing the Conformer architecture to further improve the accuracy.
We demonstrate that the extended Transformer provides state-of-the-art end-to-end ASR performance.
arXiv Detail & Related papers (2021-04-19T16:18:00Z) - Hallucination of speech recognition errors with sequence to sequence
learning [16.39332236910586]
When plain text data is to be used to train systems for spoken language understanding or ASR, a proven strategy is to hallucinate what the ASR outputs would be given a gold transcription.
We present novel end-to-end models to directly predict hallucinated ASR word sequence outputs, conditioning on an input word sequence as well as a corresponding phoneme sequence.
This improves prior published results for recall of errors from an in-domain ASR system's transcription of unseen data, as well as an out-of-domain ASR system's transcriptions of audio from an unrelated task.
arXiv Detail & Related papers (2021-03-23T02:09:39Z) - Generating Human Readable Transcript for Automatic Speech Recognition
with Pre-trained Language Model [18.26945997660616]
Many downstream tasks and human readers rely on the output of the ASR system.
We propose an ASR post-processing model that aims to transform the incorrect and noisy ASR output into a readable text.
arXiv Detail & Related papers (2021-02-22T15:45:50Z) - Improving Readability for Automatic Speech Recognition Transcription [50.86019112545596]
We propose a novel NLP task called ASR post-processing for readability (APR)
APR aims to transform the noisy ASR output into a readable text for humans and downstream tasks while maintaining the semantic meaning of the speaker.
We compare fine-tuned models based on several open-sourced and adapted pre-trained models with the traditional pipeline method.
arXiv Detail & Related papers (2020-04-09T09:26:42Z) - ASR Error Correction and Domain Adaptation Using Machine Translation [32.27379508770736]
We propose a technique to perform domain adaptation for ASR error correction via machine translation.
We observe absolute improvement in word error rate and 4 point absolute improvement in BLEU score in Google ASR output.
arXiv Detail & Related papers (2020-03-13T20:05:38Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.