Back-Translation-Style Data Augmentation for Mandarin Chinese Polyphone
Disambiguation
- URL: http://arxiv.org/abs/2211.09495v1
- Date: Thu, 17 Nov 2022 12:37:41 GMT
- Title: Back-Translation-Style Data Augmentation for Mandarin Chinese Polyphone
Disambiguation
- Authors: Chunyu Qiang, Peng Yang, Hao Che, Jinba Xiao, Xiaorui Wang, Zhongyuan
Wang
- Abstract summary: We build a Grapheme-to-Phoneme (G2P) model to predict the pronunciation of polyphonic character, and a Phoneme-to-Grapheme (P2G) model to predict pronunciation into text.
We design a data balance strategy to improve the accuracy of some typical polyphonic characters in the training set with imbalanced distribution or data scarcity.
- Score: 35.35236347070773
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Conversion of Chinese Grapheme-to-Phoneme (G2P) plays an important role in
Mandarin Chinese Text-To-Speech (TTS) systems, where one of the biggest
challenges is the task of polyphone disambiguation. Most of the previous
polyphone disambiguation models are trained on manually annotated datasets, and
publicly available datasets for polyphone disambiguation are scarce. In this
paper we propose a simple back-translation-style data augmentation method for
mandarin Chinese polyphone disambiguation, utilizing a large amount of
unlabeled text data. Inspired by the back-translation technique proposed in the
field of machine translation, we build a Grapheme-to-Phoneme (G2P) model to
predict the pronunciation of polyphonic character, and a Phoneme-to-Grapheme
(P2G) model to predict pronunciation into text. Meanwhile, a window-based
matching strategy and a multi-model scoring strategy are proposed to judge the
correctness of the pseudo-label. We design a data balance strategy to improve
the accuracy of some typical polyphonic characters in the training set with
imbalanced distribution or data scarcity. The experimental result shows the
effectiveness of the proposed back-translation-style data augmentation method.
Related papers
- External Knowledge Augmented Polyphone Disambiguation Using Large
Language Model [3.372242769313867]
We introduce a novel method to solve the problem as a generation task.
Retrieval module incorporates external knowledge which is a multi-level semantic dictionary of Chinese polyphonic characters.
Generation module adopts the decoder-only Transformer architecture to induce the target text.
Postprocess module corrects the generated text into a valid result if needed.
arXiv Detail & Related papers (2023-12-19T08:00:10Z) - Speech-to-Speech Translation For A Real-world Unwritten Language [62.414304258701804]
We study speech-to-speech translation (S2ST) that translates speech from one language into another language.
We present an end-to-end solution from training data collection, modeling choices to benchmark dataset release.
arXiv Detail & Related papers (2022-11-11T20:21:38Z) - A Polyphone BERT for Polyphone Disambiguation in Mandarin Chinese [2.380039717474099]
Grapheme-to-phoneme (G2P) conversion is an indispensable part of the Chinese Mandarin text-to-speech (TTS) system.
In this paper, we propose a Chinese polyphone BERT model to predict the pronunciations of Chinese polyphonic characters.
arXiv Detail & Related papers (2022-07-01T09:16:29Z) - Dict-TTS: Learning to Pronounce with Prior Dictionary Knowledge for
Text-to-Speech [88.22544315633687]
Polyphone disambiguation aims to capture accurate pronunciation knowledge from natural text sequences for reliable Text-to-speech systems.
We propose Dict-TTS, a semantic-aware generative text-to-speech model with an online website dictionary.
Experimental results in three languages show that our model outperforms several strong baseline models in terms of pronunciation accuracy.
arXiv Detail & Related papers (2022-06-05T10:50:34Z) - Bridging the Data Gap between Training and Inference for Unsupervised
Neural Machine Translation [49.916963624249355]
A UNMT model is trained on the pseudo parallel data with translated source, and natural source sentences in inference.
The source discrepancy between training and inference hinders the translation performance of UNMT models.
We propose an online self-training approach, which simultaneously uses the pseudo parallel data natural source, translated target to mimic the inference scenario.
arXiv Detail & Related papers (2022-03-16T04:50:27Z) - Generating More Pertinent Captions by Leveraging Semantics and Style on
Multi-Source Datasets [56.018551958004814]
This paper addresses the task of generating fluent descriptions by training on a non-uniform combination of data sources.
Large-scale datasets with noisy image-text pairs provide a sub-optimal source of supervision.
We propose to leverage and separate semantics and descriptive style through the incorporation of a style token and keywords extracted through a retrieval component.
arXiv Detail & Related papers (2021-11-24T19:00:05Z) - Polyphone Disambiguation in Mandarin Chinese with Semi-Supervised Learning [9.13211149475579]
The majority of Chinese characters are monophonic, while a special group of characters, called polyphonic characters, have multiple pronunciations.
As a prerequisite of performing speech-related generative tasks, the correct pronunciation must be identified among several candidates.
We propose a novel semi-supervised learning framework for Mandarin Chinese polyphone disambiguation.
arXiv Detail & Related papers (2021-02-01T03:47:59Z) - Decoupling Pronunciation and Language for End-to-end Code-switching
Automatic Speech Recognition [66.47000813920617]
We propose a decoupled transformer model to use monolingual paired data and unpaired text data.
The model is decoupled into two parts: audio-to-phoneme (A2P) network and phoneme-to-text (P2T) network.
By using monolingual data and unpaired text data, the decoupled transformer model reduces the high dependency on code-switching paired training data of E2E model.
arXiv Detail & Related papers (2020-10-28T07:46:15Z) - RECOApy: Data recording, pre-processing and phonetic transcription for
end-to-end speech-based applications [4.619541348328938]
RECOApy streamlines the steps of data recording and pre-processing required in end-to-end speech-based applications.
The tool implements an easy-to-use interface for prompted speech recording, spectrogram and waveform analysis, utterance-level normalisation and silence trimming.
The grapheme-to-phoneme (G2P) converters are deep neural network (DNN) based architectures trained on lexicons extracted from the Wiktionary online collaborative resource.
arXiv Detail & Related papers (2020-09-11T15:26:55Z) - g2pM: A Neural Grapheme-to-Phoneme Conversion Package for Mandarin
Chinese Based on a New Open Benchmark Dataset [14.323478990713477]
We introduce a new benchmark dataset that consists of 99,000+ sentences for Chinese polyphone disambiguation.
We train a simple neural network model on it, and find that it outperforms other preexisting G2P systems.
arXiv Detail & Related papers (2020-04-07T05:44:58Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.