A Polyphone BERT for Polyphone Disambiguation in Mandarin Chinese
- URL: http://arxiv.org/abs/2207.12089v1
- Date: Fri, 1 Jul 2022 09:16:29 GMT
- Title: A Polyphone BERT for Polyphone Disambiguation in Mandarin Chinese
- Authors: Song Zhang, Ken Zheng, Xiaoxu Zhu, Baoxiang Li
- Abstract summary: Grapheme-to-phoneme (G2P) conversion is an indispensable part of the Chinese Mandarin text-to-speech (TTS) system.
In this paper, we propose a Chinese polyphone BERT model to predict the pronunciations of Chinese polyphonic characters.
- Score: 2.380039717474099
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Grapheme-to-phoneme (G2P) conversion is an indispensable part of the Chinese
Mandarin text-to-speech (TTS) system, and the core of G2P conversion is to
solve the problem of polyphone disambiguation, which is to pick up the correct
pronunciation for several candidates for a Chinese polyphonic character. In
this paper, we propose a Chinese polyphone BERT model to predict the
pronunciations of Chinese polyphonic characters. Firstly, we create 741 new
Chinese monophonic characters from 354 source Chinese polyphonic characters by
pronunciation. Then we get a Chinese polyphone BERT by extending a pre-trained
Chinese BERT with 741 new Chinese monophonic characters and adding a
corresponding embedding layer for new tokens, which is initialized by the
embeddings of source Chinese polyphonic characters. In this way, we can turn
the polyphone disambiguation task into a pre-training task of the Chinese
polyphone BERT. Experimental results demonstrate the effectiveness of the
proposed model, and the polyphone BERT model obtain 2% (from 92.1% to 94.1%)
improvement of average accuracy compared with the BERT-based classifier model,
which is the prior state-of-the-art in polyphone disambiguation.
Related papers
- Large Language Model Should Understand Pinyin for Chinese ASR Error Correction [31.13523648668466]
We propose Pinyin-enhanced GEC to improve Chinese ASR error correction.
Our approach only utilizes synthetic errors for training and employs the one-best hypothesis during inference.
Experiments on the Aishell-1 and the Common Voice datasets demonstrate that our approach consistently outperforms GEC with text-only input.
arXiv Detail & Related papers (2024-09-20T06:50:56Z) - READIN: A Chinese Multi-Task Benchmark with Realistic and Diverse Input
Noises [87.70001456418504]
We construct READIN: a Chinese multi-task benchmark with REalistic And Diverse Input Noises.
READIN contains four diverse tasks and requests annotators to re-enter the original test data with two commonly used Chinese input methods: Pinyin input and speech input.
We experiment with a series of strong pretrained language models as well as robust training methods, we find that these models often suffer significant performance drops on READIN.
arXiv Detail & Related papers (2023-02-14T20:14:39Z) - Back-Translation-Style Data Augmentation for Mandarin Chinese Polyphone
Disambiguation [35.35236347070773]
We build a Grapheme-to-Phoneme (G2P) model to predict the pronunciation of polyphonic character, and a Phoneme-to-Grapheme (P2G) model to predict pronunciation into text.
We design a data balance strategy to improve the accuracy of some typical polyphonic characters in the training set with imbalanced distribution or data scarcity.
arXiv Detail & Related papers (2022-11-17T12:37:41Z) - Mixed-Phoneme BERT: Improving BERT with Mixed Phoneme and Sup-Phoneme
Representations for Text to Speech [104.65639892109381]
We propose MixedPhoneme BERT, a novel variant of the BERT model that uses mixed phoneme and sup-phoneme representations to enhance the learning capability.
Experiment results demonstrate that our proposed Mixed-Phoneme BERT significantly improves the TTS performance with 0.30 CMOS gain compared with the FastSpeech 2 baseline.
arXiv Detail & Related papers (2022-03-31T17:12:26Z) - ChineseBERT: Chinese Pretraining Enhanced by Glyph and Pinyin
Information [32.70080326854314]
We propose ChineseBERT, which incorporates the glyph and pinyin information of Chinese characters into language model pretraining.
The proposed ChineseBERT model yields significant performance boost over baseline models with fewer training steps.
arXiv Detail & Related papers (2021-06-30T13:06:00Z) - SHUOWEN-JIEZI: Linguistically Informed Tokenizers For Chinese Language
Model Pretraining [48.880840711568425]
We study the influences of three main factors on the Chinese tokenization for pretrained language models.
We propose three kinds of tokenizers: SHUOWEN (meaning Talk Word), the pronunciation-based tokenizers; 2) JIEZI (meaning Solve Character), the glyph-based tokenizers.
We find that SHUOWEN and JIEZI tokenizers can generally outperform conventional single-character tokenizers.
arXiv Detail & Related papers (2021-06-01T11:20:02Z) - Phoneme Recognition through Fine Tuning of Phonetic Representations: a
Case Study on Luhya Language Varieties [77.2347265289855]
We focus on phoneme recognition using Allosaurus, a method for multilingual recognition based on phonetic annotation.
To evaluate in a challenging real-world scenario, we curate phone recognition datasets for Bukusu and Saamia, two varieties of the Luhya language cluster of western Kenya and eastern Uganda.
We find that fine-tuning of Allosaurus, even with just 100 utterances, leads to significant improvements in phone error rates.
arXiv Detail & Related papers (2021-04-04T15:07:55Z) - Polyphone Disambiguation in Mandarin Chinese with Semi-Supervised Learning [9.13211149475579]
The majority of Chinese characters are monophonic, while a special group of characters, called polyphonic characters, have multiple pronunciations.
As a prerequisite of performing speech-related generative tasks, the correct pronunciation must be identified among several candidates.
We propose a novel semi-supervised learning framework for Mandarin Chinese polyphone disambiguation.
arXiv Detail & Related papers (2021-02-01T03:47:59Z) - AlloVera: A Multilingual Allophone Database [137.3686036294502]
AlloVera provides mappings from 218 allophones to phonemes for 14 languages.
We show that a "universal" allophone model, Allosaurus, built with AlloVera, outperforms "universal" phonemic models and language-specific models on a speech-transcription task.
arXiv Detail & Related papers (2020-04-17T02:02:18Z) - g2pM: A Neural Grapheme-to-Phoneme Conversion Package for Mandarin
Chinese Based on a New Open Benchmark Dataset [14.323478990713477]
We introduce a new benchmark dataset that consists of 99,000+ sentences for Chinese polyphone disambiguation.
We train a simple neural network model on it, and find that it outperforms other preexisting G2P systems.
arXiv Detail & Related papers (2020-04-07T05:44:58Z) - Towards Zero-shot Learning for Automatic Phonemic Transcription [82.9910512414173]
A more challenging problem is to build phonemic transcribers for languages with zero training data.
Our model is able to recognize unseen phonemes in the target language without any training data.
It achieves 7.7% better phoneme error rate on average over a standard multilingual model.
arXiv Detail & Related papers (2020-02-26T20:38:42Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.