Character-Level Bangla Text-to-IPA Transcription Using Transformer
Architecture with Sequence Alignment
- URL: http://arxiv.org/abs/2311.03792v1
- Date: Tue, 7 Nov 2023 08:20:06 GMT
- Title: Character-Level Bangla Text-to-IPA Transcription Using Transformer
Architecture with Sequence Alignment
- Authors: Jakir Hasan, Shrestha Datta, Ameya Debnath
- Abstract summary: International Phonetic Alphabet (IPA) is indispensable in language learning and understanding.
Bhutan being 7th as one of the widely used languages, gives rise to the need for IPA in its domain.
In this study, we have utilized a transformer-based sequence-to-sequence model at the letter and symbol level to get the IPA of each Bangla word.
- Score: 0.0
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: The International Phonetic Alphabet (IPA) is indispensable in language
learning and understanding, aiding users in accurate pronunciation and
comprehension. Additionally, it plays a pivotal role in speech therapy,
linguistic research, accurate transliteration, and the development of
text-to-speech systems, making it an essential tool across diverse fields.
Bangla being 7th as one of the widely used languages, gives rise to the need
for IPA in its domain. Its IPA mapping is too diverse to be captured manually
giving the need for Artificial Intelligence and Machine Learning in this field.
In this study, we have utilized a transformer-based sequence-to-sequence model
at the letter and symbol level to get the IPA of each Bangla word as the
variation of IPA in association of different words is almost null. Our
transformer model only consisted of 8.5 million parameters with only a single
decoder and encoder layer. Additionally, to handle the punctuation marks and
the occurrence of foreign languages in the text, we have utilized manual
mapping as the model won't be able to learn to separate them from Bangla words
while decreasing our required computational resources. Finally, maintaining the
relative position of the sentence component IPAs and generation of the combined
IPA has led us to achieve the top position with a word error rate of 0.10582 in
the public ranking of DataVerse Challenge - ITVerse 2023
(https://www.kaggle.com/competitions/dataverse_2023/).
Related papers
- Bukva: Russian Sign Language Alphabet [75.42794328290088]
This paper investigates the recognition of the Russian fingerspelling alphabet, also known as the Russian Sign Language (RSL) dactyl.
Dactyl is a component of sign languages where distinct hand movements represent individual letters of a written language.
We provide Bukva, the first full-fledged open-source video dataset for RSL dactyl recognition.
arXiv Detail & Related papers (2024-10-11T09:59:48Z) - IPA Transcription of Bengali Texts [0.2113150621171959]
The International Phonetic Alphabet (IPA) serves to systematize phonemes in language.
In Bengali phonology and phonetics, ongoing scholarly deliberations persist concerning the IPA standard and core Bengali phonemes.
This work examines prior research, identifies current and potential issues, and suggests a framework for a Bengali IPA standard.
arXiv Detail & Related papers (2024-03-29T09:33:34Z) - The taste of IPA: Towards open-vocabulary keyword spotting and forced alignment in any language [7.0944623704102625]
We show that phoneme-based models for speech processing can achieve strong crosslinguistic generalizability to unseen languages.
We propose CLAP-IPA, a multi-lingual phoneme-speech contrastive embedding model capable of open-vocabulary matching between arbitrary speech signals and phonemic sequences.
arXiv Detail & Related papers (2023-11-14T17:09:07Z) - Universal Automatic Phonetic Transcription into the International
Phonetic Alphabet [21.000425416084706]
We present a state-of-the-art model for transcribing speech in any language into the International Phonetic Alphabet (IPA)
Our model is based on wav2vec 2.0 and is fine-tuned to predict IPA from audio input.
We show that the quality of our universal speech-to-IPA models is close to that of human annotators.
arXiv Detail & Related papers (2023-08-07T21:29:51Z) - Dict-TTS: Learning to Pronounce with Prior Dictionary Knowledge for
Text-to-Speech [88.22544315633687]
Polyphone disambiguation aims to capture accurate pronunciation knowledge from natural text sequences for reliable Text-to-speech systems.
We propose Dict-TTS, a semantic-aware generative text-to-speech model with an online website dictionary.
Experimental results in three languages show that our model outperforms several strong baseline models in terms of pronunciation accuracy.
arXiv Detail & Related papers (2022-06-05T10:50:34Z) - Towards Language Modelling in the Speech Domain Using Sub-word
Linguistic Units [56.52704348773307]
We propose a novel LSTM-based generative speech LM based on linguistic units including syllables and phonemes.
With a limited dataset, orders of magnitude smaller than that required by contemporary generative models, our model closely approximates babbling speech.
We show the effect of training with auxiliary text LMs, multitask learning objectives, and auxiliary articulatory features.
arXiv Detail & Related papers (2021-10-31T22:48:30Z) - Revisiting IPA-based Cross-lingual Text-to-speech [11.010299086810994]
International Phonetic Alphabet (IPA) has been widely used in cross-lingual text-to-speech (TTS) to achieve cross-lingual voice cloning (CL VC)
In this paper, we report some empirical findings of building a cross-lingual TTS model using IPA as inputs.
Experiments show that the way to process the IPA and suprasegmental sequence has a negligible impact on the CL VC performance.
arXiv Detail & Related papers (2021-10-14T07:22:23Z) - Explicit Alignment Objectives for Multilingual Bidirectional Encoders [111.65322283420805]
We present a new method for learning multilingual encoders, AMBER (Aligned Multilingual Bi-directional EncodeR)
AMBER is trained on additional parallel data using two explicit alignment objectives that align the multilingual representations at different granularities.
Experimental results show that AMBER obtains gains of up to 1.1 average F1 score on sequence tagging and up to 27.3 average accuracy on retrieval over the XLMR-large model.
arXiv Detail & Related papers (2020-10-15T18:34:13Z) - Phonological Features for 0-shot Multilingual Speech Synthesis [50.591267188664666]
We show that code-switching is possible for languages unseen during training, even within monolingual models.
We generate intelligible, code-switched speech in a new language at test time, including the approximation of sounds never seen in training.
arXiv Detail & Related papers (2020-08-06T18:25:18Z) - GIPFA: Generating IPA Pronunciation from Audio [0.0]
In this study, we examine the use of an Artificial Neural Network (ANN) model to automatically extract the IPA phonemic pronunciation of a word.
Based on the French Wikimedia dictionary, we trained our model which then correctly predicted 75% of the IPA pronunciations tested.
arXiv Detail & Related papers (2020-06-13T06:14:11Z) - That Sounds Familiar: an Analysis of Phonetic Representations Transfer
Across Languages [72.9927937955371]
We use the resources existing in other languages to train a multilingual automatic speech recognition model.
We observe significant improvements across all languages in the multilingual setting, and stark degradation in the crosslingual setting.
Our analysis uncovered that even the phones that are unique to a single language can benefit greatly from adding training data from other languages.
arXiv Detail & Related papers (2020-05-16T22:28:09Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.