Is Word Error Rate a good evaluation metric for Speech Recognition in
Indic Languages?
- URL: http://arxiv.org/abs/2203.16601v1
- Date: Wed, 30 Mar 2022 18:32:08 GMT
- Title: Is Word Error Rate a good evaluation metric for Speech Recognition in
Indic Languages?
- Authors: Priyanshi Shah, Harveen Singh Chadha, Anirudh Gupta, Ankur Dhuriya,
Neeraj Chhimwal, Rishabh Gaur, Vivek Raghavan
- Abstract summary: We propose a new method for the calculation of error rates in Automatic Speech Recognition (ASR)
This new metric is for languages that contain half characters and where the same character can be written in different forms.
We implement our methodology in Hindi which is one of the main languages from Indic context.
- Score: 0.0
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: We propose a new method for the calculation of error rates in Automatic
Speech Recognition (ASR). This new metric is for languages that contain half
characters and where the same character can be written in different forms. We
implement our methodology in Hindi which is one of the main languages from
Indic context and we think this approach is scalable to other similar languages
containing a large character set. We call our metrics Alternate Word Error Rate
(AWER) and Alternate Character Error Rate (ACER).
We train our ASR models using wav2vec 2.0\cite{baevski2020wav2vec} for Indic
languages. Additionally we use language models to improve our model
performance. Our results show a significant improvement in analyzing the error
rates at word and character level and the interpretability of the ASR system is
improved upto $3$\% in AWER and $7$\% in ACER for Hindi. Our experiments
suggest that in languages which have complex pronunciation, there are multiple
ways of writing words without changing their meaning. In such cases AWER and
ACER will be more useful rather than WER and CER as metrics. Furthermore, we
open source a new benchmarking dataset of 21 hours for Hindi with the new
metric scripts.
Related papers
- Semantically Corrected Amharic Automatic Speech Recognition [27.569469583183423]
We build a set of ASR tools for Amharic, a language spoken by more than 50 million people in eastern Africa.
We release corrected transcriptions of existing Amharic ASR test datasets, enabling the community to accurately evaluate progress.
We introduce a post-processing approach using a transformer encoder-decoder architecture to organize raw ASR outputs into a grammatically complete and semantically meaningful Amharic sentence.
arXiv Detail & Related papers (2024-04-20T12:08:00Z) - Visual Speech Recognition for Languages with Limited Labeled Data using
Automatic Labels from Whisper [96.43501666278316]
This paper proposes a powerful Visual Speech Recognition (VSR) method for multiple languages.
We employ a Whisper model which can conduct both language identification and audio-based speech recognition.
By comparing the performances of VSR models trained on automatic labels and the human-annotated labels, we show that we can achieve similar VSR performance to that of human-annotated labels.
arXiv Detail & Related papers (2023-09-15T16:53:01Z) - Lenient Evaluation of Japanese Speech Recognition: Modeling Naturally
Occurring Spelling Inconsistency [8.888638284299736]
We create a lattice of plausible respellings of the reference transcription using a combination of lexical resources, a Japanese text-processing system, and a neural machine translation model.
Our method, which does not penalize the system for choosing a valid alternate spelling of a word, affords a 2.4%-3.1% absolute reduction in CER depending on the task.
arXiv Detail & Related papers (2023-06-07T15:39:02Z) - On the Off-Target Problem of Zero-Shot Multilingual Neural Machine
Translation [104.85258654917297]
We find that failing in encoding discriminative target language signal will lead to off-target and a closer lexical distance.
We propose Language Aware Vocabulary Sharing (LAVS) to construct the multilingual vocabulary.
We conduct experiments on a multilingual machine translation benchmark in 11 languages.
arXiv Detail & Related papers (2023-05-18T12:43:31Z) - Improving Speech Recognition for Indic Languages using Language Model [0.0]
We study the effect of applying a language model (LM) on the output of Automatic Speech Recognition (ASR) systems for Indic languages.
We fine-tune wav2vec $2.0$ models for $18$ Indic languages and adjust the results with language models trained on text derived from a variety of sources.
arXiv Detail & Related papers (2022-03-30T18:22:12Z) - Harnessing Cross-lingual Features to Improve Cognate Detection for
Low-resource Languages [50.82410844837726]
We demonstrate the use of cross-lingual word embeddings for detecting cognates among fourteen Indian languages.
We evaluate our methods to detect cognates on a challenging dataset of twelve Indian languages.
We observe an improvement of up to 18% points, in terms of F-score, for cognate detection.
arXiv Detail & Related papers (2021-12-16T11:17:58Z) - Multilingual and code-switching ASR challenges for low resource Indian
languages [59.2906853285309]
We focus on building multilingual and code-switching ASR systems through two different subtasks related to a total of seven Indian languages.
We provide a total of 600 hours of transcribed speech data, comprising train and test sets, in these languages.
We also provide a baseline recipe for both the tasks with a WER of 30.73% and 32.45% on the test sets of multilingual and code-switching subtasks, respectively.
arXiv Detail & Related papers (2021-04-01T03:37:01Z) - Language Detection Engine for Multilingual Texting on Mobile Devices [0.415623340386296]
More than 2 billion mobile users worldwide type in multiple languages in the soft keyboard.
On a monolingual keyboard, 38% of falsely auto-corrected words are valid in another language.
We present a fast, light-weight and accurate Language Detection Engine (LDE) for multilingual typing.
arXiv Detail & Related papers (2021-01-07T16:49:47Z) - edATLAS: An Efficient Disambiguation Algorithm for Texting in Languages
with Abugida Scripts [0.0]
Abugida refers to a phonogram writing system where each syllable is represented using a single consonant or typographic ligature.
We propose a disambiguation algorithm and showcase its usefulness in two novel input methods for languages using the abugida writing system.
We show an improvement in typing speed by 19.49%, 25.13%, and 14.89%, in Hindi, Bengali, and Thai, respectively, using Ambiguous Input.
arXiv Detail & Related papers (2021-01-05T03:16:34Z) - Comparison of Interactive Knowledge Base Spelling Correction Models for
Low-Resource Languages [81.90356787324481]
Spelling normalization for low resource languages is a challenging task because the patterns are hard to predict.
This work shows a comparison of a neural model and character language models with varying amounts on target language data.
Our usage scenario is interactive correction with nearly zero amounts of training examples, improving models as more data is collected.
arXiv Detail & Related papers (2020-10-20T17:31:07Z) - Explicit Alignment Objectives for Multilingual Bidirectional Encoders [111.65322283420805]
We present a new method for learning multilingual encoders, AMBER (Aligned Multilingual Bi-directional EncodeR)
AMBER is trained on additional parallel data using two explicit alignment objectives that align the multilingual representations at different granularities.
Experimental results show that AMBER obtains gains of up to 1.1 average F1 score on sequence tagging and up to 27.3 average accuracy on retrieval over the XLMR-large model.
arXiv Detail & Related papers (2020-10-15T18:34:13Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.