Accent Estimation of Japanese Words from Their Surfaces and
Romanizations for Building Large Vocabulary Accent Dictionaries
- URL: http://arxiv.org/abs/2009.09679v1
- Date: Mon, 21 Sep 2020 08:38:21 GMT
- Title: Accent Estimation of Japanese Words from Their Surfaces and
Romanizations for Building Large Vocabulary Accent Dictionaries
- Authors: Hideyuki Tachibana, Yotaro Katayama
- Abstract summary: The authors developed an accent estimation technique that predicts the accent of a word from its limited information.
It is experimentally shown that the technique can estimate accents with high accuracies, especially for some categories of words.
The authors applied this technique to an existing large vocabulary Japanese dictionary NEologd, and obtained a large vocabulary Japanese accent dictionary.
- Score: 11.77729222870674
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: In Japanese text-to-speech (TTS), it is necessary to add accent information
to the input sentence. However, there are a limited number of publicly
available accent dictionaries, and those dictionaries e.g. UniDic, do not
contain many compound words, proper nouns, etc., which are required in a
practical TTS system. In order to build a large scale accent dictionary that
contains those words, the authors developed an accent estimation technique that
predicts the accent of a word from its limited information, namely the surface
(e.g. kanji) and the yomi (simplified phonetic information). It is
experimentally shown that the technique can estimate accents with high
accuracies, especially for some categories of words. The authors applied this
technique to an existing large vocabulary Japanese dictionary NEologd, and
obtained a large vocabulary Japanese accent dictionary. Many cases have been
observed in which the use of this dictionary yields more appropriate phonetic
information than UniDic.
Related papers
- The Development of a Comprehensive Spanish Dictionary for Phonetic and Lexical Tagging in Socio-phonetic Research (ESPADA) [0.0]
I present the creation of a comprehensive pronunciation dictionary in Spanish (ESPADA) that can be used in most of the dialect variants of Spanish data.
ESPADA is the most complete dictionary with more than 628,000 entries, representing words from 16 countries.
This aims to equip socio-phonetic researchers with a complete open-source tool that enhances dialectal research within socio-phonetic frameworks in the Spanish language.
arXiv Detail & Related papers (2024-07-22T04:51:33Z) - Quantifying the redundancy between prosody and text [67.07817268372743]
We use large language models to estimate how much information is redundant between prosody and the words themselves.
We find a high degree of redundancy between the information carried by the words and prosodic information across several prosodic features.
Still, we observe that prosodic features can not be fully predicted from text, suggesting that prosody carries information above and beyond the words.
arXiv Detail & Related papers (2023-11-28T21:15:24Z) - Improving Large-scale Deep Biasing with Phoneme Features and Text-only
Data in Streaming Transducer [23.70253642540094]
Deep biasing for the Transducer can improve the recognition performance of rare words or contextual entities.
In this paper, we combine the phoneme and textual information of rare words in Transducers to distinguish words with similar pronunciation or spelling.
Experiments on the LibriSpeech corpus demonstrate that the proposed method achieves state-of-the-art performance on rare word error rate for different scales and levels of bias lists.
arXiv Detail & Related papers (2023-11-15T13:53:28Z) - Controllable Emphasis with zero data for text-to-speech [57.12383531339368]
A simple but effective method to achieve emphasized speech consists in increasing the predicted duration of the emphasised word.
We show that this is significantly better than spectrogram modification techniques improving naturalness by $7.3%$ and correct testers' identification of the emphasised word in a sentence by $40%$ on a reference female en-US voice.
arXiv Detail & Related papers (2023-07-13T21:06:23Z) - DICTDIS: Dictionary Constrained Disambiguation for Improved NMT [50.888881348723295]
We present DictDis, a lexically constrained NMT system that disambiguates between multiple candidate translations derived from dictionaries.
We demonstrate the utility of DictDis via extensive experiments on English-Hindi and English-German sentences in a variety of domains including regulatory, finance, engineering.
arXiv Detail & Related papers (2022-10-13T13:04:16Z) - Revisiting Syllables in Language Modelling and their Application on
Low-Resource Machine Translation [1.2617078020344619]
Syllables provide shorter sequences than characters, require less-specialised extracting rules than morphemes, and their segmentation is not impacted by the corpus size.
We first explore the potential of syllables for open-vocabulary language modelling in 21 languages.
We use rule-based syllabification methods for six languages and address the rest with hyphenation, which works as a syllabification proxy.
arXiv Detail & Related papers (2022-10-05T18:55:52Z) - Dict-TTS: Learning to Pronounce with Prior Dictionary Knowledge for
Text-to-Speech [88.22544315633687]
Polyphone disambiguation aims to capture accurate pronunciation knowledge from natural text sequences for reliable Text-to-speech systems.
We propose Dict-TTS, a semantic-aware generative text-to-speech model with an online website dictionary.
Experimental results in three languages show that our model outperforms several strong baseline models in terms of pronunciation accuracy.
arXiv Detail & Related papers (2022-06-05T10:50:34Z) - Automatic Dialect Density Estimation for African American English [74.44807604000967]
We explore automatic prediction of dialect density of the African American English (AAE) dialect.
dialect density is defined as the percentage of words in an utterance that contain characteristics of the non-standard dialect.
We show a significant correlation between our predicted and ground truth dialect density measures for AAE speech in this database.
arXiv Detail & Related papers (2022-04-03T01:34:48Z) - English Accent Accuracy Analysis in a State-of-the-Art Automatic Speech
Recognition System [3.4888132404740797]
We evaluate a state-of-the-art automatic speech recognition model, using unseen data from a corpus with a wide variety of labeled English accents.
We show that there is indeed an accuracy bias in terms of accentual variety, favoring the accents most prevalent in the training corpus.
arXiv Detail & Related papers (2021-05-09T08:24:33Z) - A Corpus for Large-Scale Phonetic Typology [112.19288631037055]
We present VoxClamantis v1.0, the first large-scale corpus for phonetic typology.
aligned segments and estimated phoneme-level labels in 690 readings spanning 635 languages, along with acoustic-phonetic measures of vowels and sibilants.
arXiv Detail & Related papers (2020-05-28T13:03:51Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.