Related papers: Improving TTS for Shanghainese: Addressing Tone Sandhi via Word Segmentation

Improving TTS for Shanghainese: Addressing Tone Sandhi via Word Segmentation

URL: http://arxiv.org/abs/2307.16199v1
Date: Sun, 30 Jul 2023 10:50:18 GMT
Title: Improving TTS for Shanghainese: Addressing Tone Sandhi via Word Segmentation
Authors: Yuanhao Chen
Abstract summary: Tone sandhi, which applies to all multi-syllabic words in Shanghainese, is key to natural-sounding speech. Recent work on Shanghainese TTS (text-to-speech) such as Apple's VoiceOver has shown poor performance with tone sandhi. I show that word segmentation during text preprocessing can improve the quality of tone sandhi production in TTS models.
Score: 0.0
License: http://creativecommons.org/licenses/by/4.0/
Abstract: Tone is a crucial component of the prosody of Shanghainese, a Wu Chinese variety spoken primarily in urban Shanghai. Tone sandhi, which applies to all multi-syllabic words in Shanghainese, then, is key to natural-sounding speech. Unfortunately, recent work on Shanghainese TTS (text-to-speech) such as Apple's VoiceOver has shown poor performance with tone sandhi, especially LD (left-dominant sandhi). Here I show that word segmentation during text preprocessing can improve the quality of tone sandhi production in TTS models. Syllables within the same word are annotated with a special symbol, which serves as a proxy for prosodic information of the domain of LD. Contrary to the common practice of using prosodic annotation mainly for static pauses, this paper demonstrates that prosodic annotation can also be applied to dynamic tonal phenomena. I anticipate this project to be a starting point for bringing formal linguistic accounts of Shanghainese into computational projects. Too long have we been using the Mandarin models to approximate Shanghainese, but it is a different language with its own linguistic features, and its digitisation and revitalisation should be treated as such.

Related papers

READIN: A Chinese Multi-Task Benchmark with Realistic and Diverse Input Noises [87.70001456418504]
We construct READIN: a Chinese multi-task benchmark with REalistic And Diverse Input Noises. READIN contains four diverse tasks and requests annotators to re-enter the original test data with two commonly used Chinese input methods: Pinyin input and speech input. We experiment with a series of strong pretrained language models as well as robust training methods, we find that these models often suffer significant performance drops on READIN.
arXiv Detail & Related papers (2023-02-14T20:14:39Z)
Revisiting Syllables in Language Modelling and their Application on Low-Resource Machine Translation [1.2617078020344619]
Syllables provide shorter sequences than characters, require less-specialised extracting rules than morphemes, and their segmentation is not impacted by the corpus size. We first explore the potential of syllables for open-vocabulary language modelling in 21 languages. We use rule-based syllabification methods for six languages and address the rest with hyphenation, which works as a syllabification proxy.
arXiv Detail & Related papers (2022-10-05T18:55:52Z)
A Study of Modeling Rising Intonation in Cantonese Neural Speech Synthesis [10.747119651974947]
Declarative questions are commonly used in daily Cantonese conversations. Vanilla neural text-to-speech (TTS) systems are not capable of synthesizing rising intonation for these sentences. We propose to complement the Cantonese TTS model with a BERT-based statement/question classifier.
arXiv Detail & Related papers (2022-08-03T16:21:08Z)
A Novel Chinese Dialect TTS Frontend with Non-Autoregressive Neural Machine Translation [6.090922774386845]
We propose a novel Chinese dialect TTS with a translation module. It helps to convert Mandarin text into idiomatic expressions with correct orthography and grammar. It is the first known work to incorporate translation with TTS.
arXiv Detail & Related papers (2022-06-10T07:46:34Z)
Towards Language Modelling in the Speech Domain Using Sub-word Linguistic Units [56.52704348773307]
We propose a novel LSTM-based generative speech LM based on linguistic units including syllables and phonemes. With a limited dataset, orders of magnitude smaller than that required by contemporary generative models, our model closely approximates babbling speech. We show the effect of training with auxiliary text LMs, multitask learning objectives, and auxiliary articulatory features.
arXiv Detail & Related papers (2021-10-31T22:48:30Z)
ChineseBERT: Chinese Pretraining Enhanced by Glyph and Pinyin Information [32.70080326854314]
We propose ChineseBERT, which incorporates the glyph and pinyin information of Chinese characters into language model pretraining. The proposed ChineseBERT model yields significant performance boost over baseline models with fewer training steps.
arXiv Detail & Related papers (2021-06-30T13:06:00Z)
SHUOWEN-JIEZI: Linguistically Informed Tokenizers For Chinese Language Model Pretraining [48.880840711568425]
We study the influences of three main factors on the Chinese tokenization for pretrained language models. We propose three kinds of tokenizers: SHUOWEN (meaning Talk Word), the pronunciation-based tokenizers; 2) JIEZI (meaning Solve Character), the glyph-based tokenizers. We find that SHUOWEN and JIEZI tokenizers can generally outperform conventional single-character tokenizers.
arXiv Detail & Related papers (2021-06-01T11:20:02Z)
Augmenting Part-of-speech Tagging with Syntactic Information for Vietnamese and Chinese [0.32228025627337864]
We implement the idea to improve word segmentation and part of speech tagging of the Vietnamese language by employing a simplified constituency. Our neural model for joint word segmentation and part-of-speech tagging has the architecture of the syllable-based constituency. This model can be augmented with predicted word boundary and part-of-speech tags by other tools.
arXiv Detail & Related papers (2021-02-24T08:57:02Z)
Generating Adversarial Examples in Chinese Texts Using Sentence-Pieces [60.58900627906269]
We propose a pre-train language model as the substitutes generator using sentence-pieces to craft adversarial examples in Chinese. The substitutions in the generated adversarial examples are not characters or words but textit'pieces', which are more natural to Chinese readers.
arXiv Detail & Related papers (2020-12-29T14:28:07Z)
Modeling Prosodic Phrasing with Multi-Task Learning in Tacotron-based TTS [74.11899135025503]
We extend the Tacotron-based speech synthesis framework to explicitly model the prosodic phrase breaks. We show that our proposed training scheme consistently improves the voice quality for both Chinese and Mongolian systems.
arXiv Detail & Related papers (2020-08-11T07:57:29Z)
2kenize: Tying Subword Sequences for Chinese Script Conversion [54.33749520569979]
We propose a model that can disambiguate between mappings and convert between the two scripts. Our proposed method outperforms previous Chinese Character conversion approaches by 6 points in accuracy.
arXiv Detail & Related papers (2020-05-07T10:53:05Z)

This list is automatically generated from the titles and abstracts of the papers in this site.