Automatic Speech Recognition in Sanskrit: A New Speech Corpus and
Modelling Insights
- URL: http://arxiv.org/abs/2106.05852v1
- Date: Wed, 2 Jun 2021 18:06:32 GMT
- Title: Automatic Speech Recognition in Sanskrit: A New Speech Corpus and
Modelling Insights
- Authors: Devaraja Adiga, Rishabh Kumar, Amrith Krishna, Preethi Jyothi, Ganesh
Ramakrishnan, Pawan Goyal
- Abstract summary: We release a 78 hour ASR dataset for Sanskrit, which faithfully captures several of the linguistic characteristics expressed by the language.
We propose a new modelling unit, inspired by the syllable level unit selection, that captures character sequences from one vowel in the word to the next vowel.
We extend these insights from Sanskrit ASR for building ASR systems in two other Indic languages, Gujarati and Telugu.
- Score: 25.666767669695044
- License: http://creativecommons.org/licenses/by-nc-sa/4.0/
- Abstract: Automatic speech recognition (ASR) in Sanskrit is interesting, owing to the
various linguistic peculiarities present in the language. The Sanskrit language
is lexically productive, undergoes euphonic assimilation of phones at the word
boundaries and exhibits variations in spelling conventions and in
pronunciations. In this work, we propose the first large scale study of
automatic speech recognition (ASR) in Sanskrit, with an emphasis on the impact
of unit selection in Sanskrit ASR. In this work, we release a 78 hour ASR
dataset for Sanskrit, which faithfully captures several of the linguistic
characteristics expressed by the language. We investigate the role of different
acoustic model and language model units in ASR systems for Sanskrit. We also
propose a new modelling unit, inspired by the syllable level unit selection,
that captures character sequences from one vowel in the word to the next vowel.
We also highlight the importance of choosing graphemic representations for
Sanskrit and show the impact of this choice on word error rates (WER). Finally,
we extend these insights from Sanskrit ASR for building ASR systems in two
other Indic languages, Gujarati and Telugu. For both these languages, our
experimental results show that the use of phonetic based graphemic
representations in ASR results in performance improvements as compared to ASR
systems that use native scripts.
Related papers
- Literary and Colloquial Dialect Identification for Tamil using Acoustic Features [0.0]
Speech technology plays a role in preserving various dialects of a language from going extinct.
The current work proposes a way to identify two popular and broadly classified Tamil dialects.
arXiv Detail & Related papers (2024-08-27T09:00:27Z) - LAHAJA: A Robust Multi-accent Benchmark for Evaluating Hindi ASR Systems [16.143694951047024]
We create a benchmark, LAHAJA, which contains read and extempore speech on a diverse set of topics and use cases.
We evaluate existing open-source and commercial models on LAHAJA and find their performance to be poor.
We train models using different datasets and find that our model trained on multilingual data with good speaker diversity outperforms existing models by a significant margin.
arXiv Detail & Related papers (2024-08-21T08:51:00Z) - Language-Universal Speech Attributes Modeling for Zero-Shot Multilingual Spoken Keyword Recognition [26.693942793501204]
We propose a novel language-universal approach to end-to-end automatic spoken keyword recognition (SKR)
Wav2Vec2.0 is used to generate robust speech representations, followed by a linear output layer to produce attribute sequences.
A non-trainable pronunciation model then maps sequences of attributes into spoken keywords in a multilingual setting.
arXiv Detail & Related papers (2024-06-04T16:59:11Z) - Phonemic Representation and Transcription for Speech to Text
Applications for Under-resourced Indigenous African Languages: The Case of
Kiswahili [0.0]
It has emerged that several African indigenous languages, including Kiswahili, are technologically under-resourced.
This paper explores the transcription process and the development of a Kiswahili speech corpus.
It provides an updated Kiswahili phoneme dictionary for the ASR model that was created using the CMU Sphinx speech recognition toolbox.
arXiv Detail & Related papers (2022-10-29T09:04:09Z) - ASR data augmentation in low-resource settings using cross-lingual
multi-speaker TTS and cross-lingual voice conversion [49.617722668505834]
We show that our approach permits the application of speech synthesis and voice conversion to improve ASR systems using only one target-language speaker during model training.
It is possible to obtain promising ASR training results with our data augmentation method using only a single real speaker in a target language.
arXiv Detail & Related papers (2022-03-29T11:55:30Z) - Discovering Phonetic Inventories with Crosslingual Automatic Speech
Recognition [71.49308685090324]
This paper investigates the influence of different factors (i.e., model architecture, phonotactic model, type of speech representation) on phone recognition in an unknown language.
We find that unique sounds, similar sounds, and tone languages remain a major challenge for phonetic inventory discovery.
arXiv Detail & Related papers (2022-01-26T22:12:55Z) - Towards One Model to Rule All: Multilingual Strategy for Dialectal
Code-Switching Arabic ASR [11.363966269198064]
We design a large multilingual end-to-end ASR using self-attention based conformer architecture.
We trained the system using Arabic (Ar), English (En) and French (Fr) languages.
Our findings demonstrate the strength of such a model by outperforming state-of-the-art monolingual dialectal Arabic and code-switching Arabic ASR.
arXiv Detail & Related papers (2021-05-31T08:20:38Z) - Multilingual and code-switching ASR challenges for low resource Indian
languages [59.2906853285309]
We focus on building multilingual and code-switching ASR systems through two different subtasks related to a total of seven Indian languages.
We provide a total of 600 hours of transcribed speech data, comprising train and test sets, in these languages.
We also provide a baseline recipe for both the tasks with a WER of 30.73% and 32.45% on the test sets of multilingual and code-switching subtasks, respectively.
arXiv Detail & Related papers (2021-04-01T03:37:01Z) - Acoustics Based Intent Recognition Using Discovered Phonetic Units for
Low Resource Languages [51.0542215642794]
We propose a novel acoustics based intent recognition system that uses discovered phonetic units for intent classification.
We present results for two languages families - Indic languages and Romance languages, for two different intent recognition tasks.
arXiv Detail & Related papers (2020-11-07T00:35:31Z) - How Phonotactics Affect Multilingual and Zero-shot ASR Performance [74.70048598292583]
A Transformer encoder-decoder model has been shown to leverage multilingual data well in IPA transcriptions of languages presented during training.
We replace the encoder-decoder with a hybrid ASR system consisting of a separate AM and LM.
We show that the gain from modeling crosslingual phonotactics is limited, and imposing a too strong model can hurt the zero-shot transfer.
arXiv Detail & Related papers (2020-10-22T23:07:24Z) - That Sounds Familiar: an Analysis of Phonetic Representations Transfer
Across Languages [72.9927937955371]
We use the resources existing in other languages to train a multilingual automatic speech recognition model.
We observe significant improvements across all languages in the multilingual setting, and stark degradation in the crosslingual setting.
Our analysis uncovered that even the phones that are unique to a single language can benefit greatly from adding training data from other languages.
arXiv Detail & Related papers (2020-05-16T22:28:09Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.