Related papers: Character-Level Bangla Text-to-IPA Transcription Using Transformer Architecture with Sequence Alignment

Character-Level Bangla Text-to-IPA Transcription Using Transformer Architecture with Sequence Alignment

URL: http://arxiv.org/abs/2311.03792v1
Date: Tue, 7 Nov 2023 08:20:06 GMT
Title: Character-Level Bangla Text-to-IPA Transcription Using Transformer Architecture with Sequence Alignment
Authors: Jakir Hasan, Shrestha Datta, Ameya Debnath
Abstract summary: International Phonetic Alphabet (IPA) is indispensable in language learning and understanding. Bhutan being 7th as one of the widely used languages, gives rise to the need for IPA in its domain. In this study, we have utilized a transformer-based sequence-to-sequence model at the letter and symbol level to get the IPA of each Bangla word.
Score: 0.0
License: http://creativecommons.org/licenses/by/4.0/
Abstract: The International Phonetic Alphabet (IPA) is indispensable in language learning and understanding, aiding users in accurate pronunciation and comprehension. Additionally, it plays a pivotal role in speech therapy, linguistic research, accurate transliteration, and the development of text-to-speech systems, making it an essential tool across diverse fields. Bangla being 7th as one of the widely used languages, gives rise to the need for IPA in its domain. Its IPA mapping is too diverse to be captured manually giving the need for Artificial Intelligence and Machine Learning in this field. In this study, we have utilized a transformer-based sequence-to-sequence model at the letter and symbol level to get the IPA of each Bangla word as the variation of IPA in association of different words is almost null. Our transformer model only consisted of 8.5 million parameters with only a single decoder and encoder layer. Additionally, to handle the punctuation marks and the occurrence of foreign languages in the text, we have utilized manual mapping as the model won't be able to learn to separate them from Bangla words while decreasing our required computational resources. Finally, maintaining the relative position of the sentence component IPAs and generation of the combined IPA has led us to achieve the top position with a word error rate of 0.10582 in the public ranking of DataVerse Challenge - ITVerse 2023 (https://www.kaggle.com/competitions/dataverse_2023/).

Related papers

Cross-Lingual IPA Contrastive Learning for Zero-Shot NER [7.788300011344196]
We investigate how reducing the phonemic representation gap in IPA transcription enables models trained on high-resource languages to perform effectively on low-resource languages. Our proposed dataset and methodology demonstrate a substantial average gain when compared to the best performing baseline.
arXiv Detail & Related papers (2025-03-10T11:52:33Z)
Bukva: Russian Sign Language Alphabet [75.42794328290088]
This paper investigates the recognition of the Russian fingerspelling alphabet, also known as the Russian Sign Language (RSL) dactyl. Dactyl is a component of sign languages where distinct hand movements represent individual letters of a written language. We provide Bukva, the first full-fledged open-source video dataset for RSL dactyl recognition.
arXiv Detail & Related papers (2024-10-11T09:59:48Z)
IPA Transcription of Bengali Texts [0.2113150621171959]
The International Phonetic Alphabet (IPA) serves to systematize phonemes in language. In Bengali phonology and phonetics, ongoing scholarly deliberations persist concerning the IPA standard and core Bengali phonemes. This work examines prior research, identifies current and potential issues, and suggests a framework for a Bengali IPA standard.
arXiv Detail & Related papers (2024-03-29T09:33:34Z)
The taste of IPA: Towards open-vocabulary keyword spotting and forced alignment in any language [7.0944623704102625]
We show that phoneme-based models for speech processing can achieve strong crosslinguistic generalizability to unseen languages. We propose CLAP-IPA, a multi-lingual phoneme-speech contrastive embedding model capable of open-vocabulary matching between arbitrary speech signals and phonemic sequences.
arXiv Detail & Related papers (2023-11-14T17:09:07Z)
Universal Automatic Phonetic Transcription into the International Phonetic Alphabet [21.000425416084706]
We present a state-of-the-art model for transcribing speech in any language into the International Phonetic Alphabet (IPA) Our model is based on wav2vec 2.0 and is fine-tuned to predict IPA from audio input. We show that the quality of our universal speech-to-IPA models is close to that of human annotators.
arXiv Detail & Related papers (2023-08-07T21:29:51Z)
Dict-TTS: Learning to Pronounce with Prior Dictionary Knowledge for Text-to-Speech [88.22544315633687]
Polyphone disambiguation aims to capture accurate pronunciation knowledge from natural text sequences for reliable Text-to-speech systems. We propose Dict-TTS, a semantic-aware generative text-to-speech model with an online website dictionary. Experimental results in three languages show that our model outperforms several strong baseline models in terms of pronunciation accuracy.
arXiv Detail & Related papers (2022-06-05T10:50:34Z)
Towards Language Modelling in the Speech Domain Using Sub-word Linguistic Units [56.52704348773307]
We propose a novel LSTM-based generative speech LM based on linguistic units including syllables and phonemes. With a limited dataset, orders of magnitude smaller than that required by contemporary generative models, our model closely approximates babbling speech. We show the effect of training with auxiliary text LMs, multitask learning objectives, and auxiliary articulatory features.
arXiv Detail & Related papers (2021-10-31T22:48:30Z)
Revisiting IPA-based Cross-lingual Text-to-speech [11.010299086810994]
International Phonetic Alphabet (IPA) has been widely used in cross-lingual text-to-speech (TTS) to achieve cross-lingual voice cloning (CL VC) In this paper, we report some empirical findings of building a cross-lingual TTS model using IPA as inputs. Experiments show that the way to process the IPA and suprasegmental sequence has a negligible impact on the CL VC performance.
arXiv Detail & Related papers (2021-10-14T07:22:23Z)
Explicit Alignment Objectives for Multilingual Bidirectional Encoders [111.65322283420805]
We present a new method for learning multilingual encoders, AMBER (Aligned Multilingual Bi-directional EncodeR) AMBER is trained on additional parallel data using two explicit alignment objectives that align the multilingual representations at different granularities. Experimental results show that AMBER obtains gains of up to 1.1 average F1 score on sequence tagging and up to 27.3 average accuracy on retrieval over the XLMR-large model.
arXiv Detail & Related papers (2020-10-15T18:34:13Z)
Phonological Features for 0-shot Multilingual Speech Synthesis [50.591267188664666]
We show that code-switching is possible for languages unseen during training, even within monolingual models. We generate intelligible, code-switched speech in a new language at test time, including the approximation of sounds never seen in training.
arXiv Detail & Related papers (2020-08-06T18:25:18Z)
GIPFA: Generating IPA Pronunciation from Audio [0.0]
In this study, we examine the use of an Artificial Neural Network (ANN) model to automatically extract the IPA phonemic pronunciation of a word. Based on the French Wikimedia dictionary, we trained our model which then correctly predicted 75% of the IPA pronunciations tested.
arXiv Detail & Related papers (2020-06-13T06:14:11Z)
That Sounds Familiar: an Analysis of Phonetic Representations Transfer Across Languages [72.9927937955371]
We use the resources existing in other languages to train a multilingual automatic speech recognition model. We observe significant improvements across all languages in the multilingual setting, and stark degradation in the crosslingual setting. Our analysis uncovered that even the phones that are unique to a single language can benefit greatly from adding training data from other languages.
arXiv Detail & Related papers (2020-05-16T22:28:09Z)

This list is automatically generated from the titles and abstracts of the papers in this site.