Automatic Speech Recognition in Sanskrit: A New Speech Corpus and
Modelling Insights
- URL: http://arxiv.org/abs/2106.05852v1
- Date: Wed, 2 Jun 2021 18:06:32 GMT
- Title: Automatic Speech Recognition in Sanskrit: A New Speech Corpus and
Modelling Insights
- Authors: Devaraja Adiga, Rishabh Kumar, Amrith Krishna, Preethi Jyothi, Ganesh
Ramakrishnan, Pawan Goyal
- Abstract summary: We release a 78 hour ASR dataset for Sanskrit, which faithfully captures several of the linguistic characteristics expressed by the language.
We propose a new modelling unit, inspired by the syllable level unit selection, that captures character sequences from one vowel in the word to the next vowel.
We extend these insights from Sanskrit ASR for building ASR systems in two other Indic languages, Gujarati and Telugu.
- Score: 25.666767669695044
- License: http://creativecommons.org/licenses/by-nc-sa/4.0/
- Abstract: Automatic speech recognition (ASR) in Sanskrit is interesting, owing to the
various linguistic peculiarities present in the language. The Sanskrit language
is lexically productive, undergoes euphonic assimilation of phones at the word
boundaries and exhibits variations in spelling conventions and in
pronunciations. In this work, we propose the first large scale study of
automatic speech recognition (ASR) in Sanskrit, with an emphasis on the impact
of unit selection in Sanskrit ASR. In this work, we release a 78 hour ASR
dataset for Sanskrit, which faithfully captures several of the linguistic
characteristics expressed by the language. We investigate the role of different
acoustic model and language model units in ASR systems for Sanskrit. We also
propose a new modelling unit, inspired by the syllable level unit selection,
that captures character sequences from one vowel in the word to the next vowel.
We also highlight the importance of choosing graphemic representations for
Sanskrit and show the impact of this choice on word error rates (WER). Finally,
we extend these insights from Sanskrit ASR for building ASR systems in two
other Indic languages, Gujarati and Telugu. For both these languages, our
experimental results show that the use of phonetic based graphemic
representations in ASR results in performance improvements as compared to ASR
systems that use native scripts.
Related papers
- Language-Universal Speech Attributes Modeling for Zero-Shot Multilingual Spoken Keyword Recognition [26.693942793501204]
We propose a novel language-universal approach to end-to-end automatic spoken keyword recognition (SKR)
Wav2Vec2.0 is used to generate robust speech representations, followed by a linear output layer to produce attribute sequences.
A non-trainable pronunciation model then maps sequences of attributes into spoken keywords in a multilingual setting.
arXiv Detail & Related papers (2024-06-04T16:59:11Z) - MUST&P-SRL: Multi-lingual and Unified Syllabification in Text and
Phonetic Domains for Speech Representation Learning [0.76146285961466]
We present a methodology for linguistic feature extraction, focusing on automatically syllabifying words in multiple languages.
In both the textual and phonetic domains, our method focuses on the extraction of phonetic transcriptions from text, stress marks, and a unified automatic syllabification.
The system was built with open-source components and resources.
arXiv Detail & Related papers (2023-10-17T19:27:23Z) - A Deep Dive into the Disparity of Word Error Rates Across Thousands of
NPTEL MOOC Videos [4.809236881780707]
We describe the curation of a massive speech dataset of 8740 hours consisting of $sim9.8$K technical lectures in the English language along with their transcripts delivered by instructors representing various parts of Indian demography.
We use the curated dataset to measure the existing disparity in YouTube Automatic Captions and OpenAI Whisper model performance across the diverse demographic traits of speakers in India.
arXiv Detail & Related papers (2023-07-20T05:03:00Z) - Phonemic Representation and Transcription for Speech to Text
Applications for Under-resourced Indigenous African Languages: The Case of
Kiswahili [0.0]
It has emerged that several African indigenous languages, including Kiswahili, are technologically under-resourced.
This paper explores the transcription process and the development of a Kiswahili speech corpus.
It provides an updated Kiswahili phoneme dictionary for the ASR model that was created using the CMU Sphinx speech recognition toolbox.
arXiv Detail & Related papers (2022-10-29T09:04:09Z) - ASR data augmentation in low-resource settings using cross-lingual
multi-speaker TTS and cross-lingual voice conversion [49.617722668505834]
We show that our approach permits the application of speech synthesis and voice conversion to improve ASR systems using only one target-language speaker during model training.
It is possible to obtain promising ASR training results with our data augmentation method using only a single real speaker in a target language.
arXiv Detail & Related papers (2022-03-29T11:55:30Z) - Discovering Phonetic Inventories with Crosslingual Automatic Speech
Recognition [71.49308685090324]
This paper investigates the influence of different factors (i.e., model architecture, phonotactic model, type of speech representation) on phone recognition in an unknown language.
We find that unique sounds, similar sounds, and tone languages remain a major challenge for phonetic inventory discovery.
arXiv Detail & Related papers (2022-01-26T22:12:55Z) - Towards One Model to Rule All: Multilingual Strategy for Dialectal
Code-Switching Arabic ASR [11.363966269198064]
We design a large multilingual end-to-end ASR using self-attention based conformer architecture.
We trained the system using Arabic (Ar), English (En) and French (Fr) languages.
Our findings demonstrate the strength of such a model by outperforming state-of-the-art monolingual dialectal Arabic and code-switching Arabic ASR.
arXiv Detail & Related papers (2021-05-31T08:20:38Z) - Multilingual and code-switching ASR challenges for low resource Indian
languages [59.2906853285309]
We focus on building multilingual and code-switching ASR systems through two different subtasks related to a total of seven Indian languages.
We provide a total of 600 hours of transcribed speech data, comprising train and test sets, in these languages.
We also provide a baseline recipe for both the tasks with a WER of 30.73% and 32.45% on the test sets of multilingual and code-switching subtasks, respectively.
arXiv Detail & Related papers (2021-04-01T03:37:01Z) - Acoustics Based Intent Recognition Using Discovered Phonetic Units for
Low Resource Languages [51.0542215642794]
We propose a novel acoustics based intent recognition system that uses discovered phonetic units for intent classification.
We present results for two languages families - Indic languages and Romance languages, for two different intent recognition tasks.
arXiv Detail & Related papers (2020-11-07T00:35:31Z) - How Phonotactics Affect Multilingual and Zero-shot ASR Performance [74.70048598292583]
A Transformer encoder-decoder model has been shown to leverage multilingual data well in IPA transcriptions of languages presented during training.
We replace the encoder-decoder with a hybrid ASR system consisting of a separate AM and LM.
We show that the gain from modeling crosslingual phonotactics is limited, and imposing a too strong model can hurt the zero-shot transfer.
arXiv Detail & Related papers (2020-10-22T23:07:24Z) - That Sounds Familiar: an Analysis of Phonetic Representations Transfer
Across Languages [72.9927937955371]
We use the resources existing in other languages to train a multilingual automatic speech recognition model.
We observe significant improvements across all languages in the multilingual setting, and stark degradation in the crosslingual setting.
Our analysis uncovered that even the phones that are unique to a single language can benefit greatly from adding training data from other languages.
arXiv Detail & Related papers (2020-05-16T22:28:09Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.