Polyphone Disambiguition in Mandarin Chinese with Semi-Supervised
Learning
- URL: http://arxiv.org/abs/2102.00621v1
- Date: Mon, 1 Feb 2021 03:47:59 GMT
- Title: Polyphone Disambiguition in Mandarin Chinese with Semi-Supervised
Learning
- Authors: Yi Shi and Congyi Wang and Yu Chen and Bin Wang
- Abstract summary: We propose a novel semi-supervised learning framework for Mandarin Chinese polyphone disambiguation.
We explore the effect of various proxy labeling strategies including entropy-thresholding and lexicon-based labeling.
- Score: 9.595035978417322
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: The majority of Chinese characters are monophonic, i.e.their pronunciations
are unique and thus can be induced easily using a check table. As for their
counterparts, polyphonic characters have more than one pronunciation. To
perform linguistic computation tasks related to spoken Mandarin Chinese, the
correct pronunciation for each polyphone must be identified among several
candidates according to its context. This process is called Polyphone
Disambiguation, a key procedure in the Grapheme-to-phoneme (G2P) conversion
step of a Chinese text-to-speech (TTS) system. The problem is well explored
with both knowledge-based and learning-based approaches, yet it remains
challenging due to the lack of publicly available datasets and complex language
phenomenon concerned polyphone. In this paper, we propose a novel
semi-supervised learning (SSL) framework for Mandarin Chinese polyphone
disambiguation that can potentially leverage unlimited unlabeled text data. We
explore the effect of various proxy labeling strategies including
entropy-thresholding and lexicon-based labeling. As for the architecture, a
pre-trained model of Electra is combined with Convolution BLSTM layers to
fine-tune on our task. Qualitative and quantitative experiments demonstrate
that our method achieves state-of-the-art performance in Mandarin Chinese
polyphone disambiguation. In addition, we publish a novel dataset specifically
for the polyphone disambiguation task to promote further researches.
Related papers
- External Knowledge Augmented Polyphone Disambiguation Using Large
Language Model [3.372242769313867]
We introduce a novel method to solve the problem as a generation task.
Retrieval module incorporates external knowledge which is a multi-level semantic dictionary of Chinese polyphonic characters.
Generation module adopts the decoder-only Transformer architecture to induce the target text.
Postprocess module corrects the generated text into a valid result if needed.
arXiv Detail & Related papers (2023-12-19T08:00:10Z) - Multilingual context-based pronunciation learning for Text-to-Speech [13.941800219395757]
Phonetic information and linguistic knowledge are an essential component of a Text-to-speech (TTS) front-end.
We showcase a multilingual unified front-end system that addresses any pronunciation related task, typically handled by separate modules.
We find that the multilingual model is competitive across languages and tasks, however, some trade-offs exists when compared to equivalent monolingual solutions.
arXiv Detail & Related papers (2023-07-31T14:29:06Z) - Shuo Wen Jie Zi: Rethinking Dictionaries and Glyphs for Chinese Language
Pre-training [50.100992353488174]
We introduce CDBERT, a new learning paradigm that enhances the semantics understanding ability of the Chinese PLMs with dictionary knowledge and structure of Chinese characters.
We name the two core modules of CDBERT as Shuowen and Jiezi, where Shuowen refers to the process of retrieving the most appropriate meaning from Chinese dictionaries.
Our paradigm demonstrates consistent improvements on previous Chinese PLMs across all tasks.
arXiv Detail & Related papers (2023-05-30T05:48:36Z) - Back-Translation-Style Data Augmentation for Mandarin Chinese Polyphone
Disambiguation [35.35236347070773]
We build a Grapheme-to-Phoneme (G2P) model to predict the pronunciation of polyphonic character, and a Phoneme-to-Grapheme (P2G) model to predict pronunciation into text.
We design a data balance strategy to improve the accuracy of some typical polyphonic characters in the training set with imbalanced distribution or data scarcity.
arXiv Detail & Related papers (2022-11-17T12:37:41Z) - A Polyphone BERT for Polyphone Disambiguation in Mandarin Chinese [2.380039717474099]
Grapheme-to-phoneme (G2P) conversion is an indispensable part of the Chinese Mandarin text-to-speech (TTS) system.
In this paper, we propose a Chinese polyphone BERT model to predict the pronunciations of Chinese polyphonic characters.
arXiv Detail & Related papers (2022-07-01T09:16:29Z) - Dict-TTS: Learning to Pronounce with Prior Dictionary Knowledge for
Text-to-Speech [88.22544315633687]
Polyphone disambiguation aims to capture accurate pronunciation knowledge from natural text sequences for reliable Text-to-speech systems.
We propose Dict-TTS, a semantic-aware generative text-to-speech model with an online website dictionary.
Experimental results in three languages show that our model outperforms several strong baseline models in terms of pronunciation accuracy.
arXiv Detail & Related papers (2022-06-05T10:50:34Z) - Discovering Phonetic Inventories with Crosslingual Automatic Speech
Recognition [71.49308685090324]
This paper investigates the influence of different factors (i.e., model architecture, phonotactic model, type of speech representation) on phone recognition in an unknown language.
We find that unique sounds, similar sounds, and tone languages remain a major challenge for phonetic inventory discovery.
arXiv Detail & Related papers (2022-01-26T22:12:55Z) - Phoneme Recognition through Fine Tuning of Phonetic Representations: a
Case Study on Luhya Language Varieties [77.2347265289855]
We focus on phoneme recognition using Allosaurus, a method for multilingual recognition based on phonetic annotation.
To evaluate in a challenging real-world scenario, we curate phone recognition datasets for Bukusu and Saamia, two varieties of the Luhya language cluster of western Kenya and eastern Uganda.
We find that fine-tuning of Allosaurus, even with just 100 utterances, leads to significant improvements in phone error rates.
arXiv Detail & Related papers (2021-04-04T15:07:55Z) - Phonological Features for 0-shot Multilingual Speech Synthesis [50.591267188664666]
We show that code-switching is possible for languages unseen during training, even within monolingual models.
We generate intelligible, code-switched speech in a new language at test time, including the approximation of sounds never seen in training.
arXiv Detail & Related papers (2020-08-06T18:25:18Z) - AlloVera: A Multilingual Allophone Database [137.3686036294502]
AlloVera provides mappings from 218 allophones to phonemes for 14 languages.
We show that a "universal" allophone model, Allosaurus, built with AlloVera, outperforms "universal" phonemic models and language-specific models on a speech-transcription task.
arXiv Detail & Related papers (2020-04-17T02:02:18Z) - g2pM: A Neural Grapheme-to-Phoneme Conversion Package for Mandarin
Chinese Based on a New Open Benchmark Dataset [14.323478990713477]
We introduce a new benchmark dataset that consists of 99,000+ sentences for Chinese polyphone disambiguation.
We train a simple neural network model on it, and find that it outperforms other preexisting G2P systems.
arXiv Detail & Related papers (2020-04-07T05:44:58Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.