Diacritic Recognition Performance in Arabic ASR
- URL: http://arxiv.org/abs/2302.14022v1
- Date: Mon, 27 Feb 2023 18:27:42 GMT
- Title: Diacritic Recognition Performance in Arabic ASR
- Authors: Hanan Aldarmaki and Ahmad Ghannam
- Abstract summary: We present an analysis of diacritic recognition performance in Arabic Automatic Speech Recognition systems.
Current state-of-the-art ASR models do not produce full diacritization in their output.
- Score: 2.28438857884398
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: We present an analysis of diacritic recognition performance in Arabic
Automatic Speech Recognition (ASR) systems. As most existing Arabic speech
corpora do not contain all diacritical marks, which represent short vowels and
other phonetic information in Arabic script, current state-of-the-art ASR
models do not produce full diacritization in their output. Automatic text-based
diacritization has previously been employed both as a pre-processing step to
train diacritized ASR, or as a post-processing step to diacritize the resulting
ASR hypotheses. It is generally believed that input diacritization degrades ASR
performance, but no systematic evaluation of ASR diacritization performance,
independent of ASR performance, has been conducted to date. In this paper, we
attempt to experimentally clarify whether input diacritiztation indeed degrades
ASR quality, and to compare the diacritic recognition performance against
text-based diacritization as a post-processing step. We start with pre-trained
Arabic ASR models and fine-tune them on transcribed speech data with different
diacritization conditions: manual, automatic, and no diacritization. We isolate
diacritic recognition performance from the overall ASR performance using
coverage and precision metrics. We find that ASR diacritization significantly
outperforms text-based diacritization in post-processing, particularly when the
ASR model is fine-tuned with manually diacritized transcripts.
Related papers
- What is lost in Normalization? Exploring Pitfalls in Multilingual ASR Model Evaluations [0.0]
We investigate the text normalization routine employed by leading ASR models, including OpenAI Whisper, Meta's MMS, Seamless, and Assembly AI's Conformer.
Our research reveals that current text normalization practices, while aiming to standardize ASR outputs for fair comparison, are fundamentally flawed when applied to Indic scripts.
We propose a shift towards developing text normalization routines that leverage native linguistic expertise.
arXiv Detail & Related papers (2024-09-04T05:08:23Z) - Towards interfacing large language models with ASR systems using confidence measures and prompting [54.39667883394458]
This work investigates post-hoc correction of ASR transcripts with large language models (LLMs)
To avoid introducing errors into likely accurate transcripts, we propose a range of confidence-based filtering methods.
Our results indicate that this can improve the performance of less competitive ASR systems.
arXiv Detail & Related papers (2024-07-31T08:00:41Z) - Reexamining Racial Disparities in Automatic Speech Recognition Performance: The Role of Confounding by Provenance [7.882996636086014]
It is important that automatic speech recognition (ASR) models and their use is fair and equitable.
The current study seeks to understand the factors underlying this disparity by examining the performance of the current state-of-the-art neural network based ASR system.
arXiv Detail & Related papers (2024-07-19T02:14:17Z) - Speech Emotion Recognition with ASR Transcripts: A Comprehensive Study on Word Error Rate and Fusion Techniques [17.166092544686553]
This study benchmarks Speech Emotion Recognition using ASR transcripts with varying Word Error Rates (WERs) from eleven models on three well-known corpora.
We propose a unified ASR error-robust framework integrating ASR error correction and modality-gated fusion, achieving lower WER and higher SER results compared to the best-performing ASR transcript.
arXiv Detail & Related papers (2024-06-12T15:59:25Z) - LibriSpeech-PC: Benchmark for Evaluation of Punctuation and
Capitalization Capabilities of end-to-end ASR Models [58.790604613878216]
We introduce a LibriSpeech-PC benchmark designed to assess the punctuation and capitalization prediction capabilities of end-to-end ASR models.
The benchmark includes a LibriSpeech-PC dataset with restored punctuation and capitalization, a novel evaluation metric called Punctuation Error Rate (PER) that focuses on punctuation marks, and initial baseline models.
arXiv Detail & Related papers (2023-10-04T16:23:37Z) - Boosting Punctuation Restoration with Data Generation and Reinforcement
Learning [70.26450819702728]
Punctuation restoration is an important task in automatic speech recognition (ASR)
The discrepancy between written punctuated texts and ASR texts limits the usability of written texts in training punctuation restoration systems for ASR texts.
This paper proposes a reinforcement learning method to exploit in-topic written texts and recent advances in large pre-trained generative language models to bridge this gap.
arXiv Detail & Related papers (2023-07-24T17:22:04Z) - RED-ACE: Robust Error Detection for ASR using Confidence Embeddings [5.4693121539705984]
We propose to utilize the ASR system's word-level confidence scores for improving AED performance.
We add an ASR Confidence Embedding layer to the AED model's encoder, allowing us to jointly encode the confidence scores and the transcribed text into a contextualized representation.
arXiv Detail & Related papers (2022-03-14T15:13:52Z) - Contextualized Attention-based Knowledge Transfer for Spoken
Conversational Question Answering [63.72278693825945]
Spoken conversational question answering (SCQA) requires machines to model complex dialogue flow.
We propose CADNet, a novel contextualized attention-based distillation approach.
We conduct extensive experiments on the Spoken-CoQA dataset and demonstrate that our approach achieves remarkable performance.
arXiv Detail & Related papers (2020-10-21T15:17:18Z) - Improving Readability for Automatic Speech Recognition Transcription [50.86019112545596]
We propose a novel NLP task called ASR post-processing for readability (APR)
APR aims to transform the noisy ASR output into a readable text for humans and downstream tasks while maintaining the semantic meaning of the speaker.
We compare fine-tuned models based on several open-sourced and adapted pre-trained models with the traditional pipeline method.
arXiv Detail & Related papers (2020-04-09T09:26:42Z) - Joint Contextual Modeling for ASR Correction and Language Understanding [60.230013453699975]
We propose multi-task neural approaches to perform contextual language correction on ASR outputs jointly with language understanding (LU)
We show that the error rates of off the shelf ASR and following LU systems can be reduced significantly by 14% relative with joint models trained using small amounts of in-domain data.
arXiv Detail & Related papers (2020-01-28T22:09:25Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.