Syntactic representation learning for neural network based TTS with
syntactic parse tree traversal
- URL: http://arxiv.org/abs/2012.06971v1
- Date: Sun, 13 Dec 2020 05:52:07 GMT
- Title: Syntactic representation learning for neural network based TTS with
syntactic parse tree traversal
- Authors: Changhe Song, Jingbei Li, Yixuan Zhou, Zhiyong Wu, Helen Meng
- Abstract summary: We propose a syntactic representation learning method based on syntactic parse tree to automatically utilize the syntactic structure information.
Experimental results demonstrate the effectiveness of our proposed approach.
For sentences with multiple syntactic parse trees, prosodic differences can be clearly perceived from the synthesized speeches.
- Score: 49.05471750563229
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Syntactic structure of a sentence text is correlated with the prosodic
structure of the speech that is crucial for improving the prosody and
naturalness of a text-to-speech (TTS) system. Nowadays TTS systems usually try
to incorporate syntactic structure information with manually designed features
based on expert knowledge. In this paper, we propose a syntactic representation
learning method based on syntactic parse tree traversal to automatically
utilize the syntactic structure information. Two constituent label sequences
are linearized through left-first and right-first traversals from constituent
parse tree. Syntactic representations are then extracted at word level from
each constituent label sequence by a corresponding uni-directional gated
recurrent unit (GRU) network. Meanwhile, nuclear-norm maximization loss is
introduced to enhance the discriminability and diversity of the embeddings of
constituent labels. Upsampled syntactic representations and phoneme embeddings
are concatenated to serve as the encoder input of Tacotron2. Experimental
results demonstrate the effectiveness of our proposed approach, with mean
opinion score (MOS) increasing from 3.70 to 3.82 and ABX preference exceeding
by 17% compared with the baseline. In addition, for sentences with multiple
syntactic parse trees, prosodic differences can be clearly perceived from the
synthesized speeches.
Related papers
- Syntactic Complexity Identification, Measurement, and Reduction Through
Controlled Syntactic Simplification [0.0]
We present a classical syntactic dependency-based approach to split and rephrase a compound and complex sentence into a set of simplified sentences.
The paper also introduces an algorithm to identify and measure a sentence's syntactic complexity.
This work is accepted and presented in International workshop on Learning with Knowledge Graphs (IWLKG) at WSDM-2023 Conference.
arXiv Detail & Related papers (2023-04-16T13:13:58Z) - Syntactic Structure Processing in the Brain while Listening [3.735055636181383]
There are two popular syntactic parsing methods: constituency and dependency parsing.
Recent works have used syntactic embeddings based on constituency trees, incremental top-down parsing, and other word syntactic features for brain activity prediction given the text stimuli to study how the syntax structure is represented in the brain's language network.
We investigate the predictive power of the brain encoding models in three settings: (i) individual performance of the constituency and dependency syntactic parsing based embedding methods, (ii) efficacy of these syntactic parsing based embedding methods when controlling for basic syntactic signals, and (iii) relative effectiveness of each of the syntactic embedding methods when controlling for
arXiv Detail & Related papers (2023-02-16T21:28:11Z) - Speaker Embedding-aware Neural Diarization: a Novel Framework for
Overlapped Speech Diarization in the Meeting Scenario [51.5031673695118]
We reformulate overlapped speech diarization as a single-label prediction problem.
We propose the speaker embedding-aware neural diarization (SEND) system.
arXiv Detail & Related papers (2022-03-18T06:40:39Z) - Incorporating Constituent Syntax for Coreference Resolution [50.71868417008133]
We propose a graph-based method to incorporate constituent syntactic structures.
We also explore to utilise higher-order neighbourhood information to encode rich structures in constituent trees.
Experiments on the English and Chinese portions of OntoNotes 5.0 benchmark show that our proposed model either beats a strong baseline or achieves new state-of-the-art performance.
arXiv Detail & Related papers (2022-02-22T07:40:42Z) - Syntactic Perturbations Reveal Representational Correlates of
Hierarchical Phrase Structure in Pretrained Language Models [22.43510769150502]
It is not entirely clear what aspects of sentence-level syntax are captured by vector-based language representations.
We show that Transformers build sensitivity to larger parts of the sentence along their layers, and that hierarchical phrase structure plays a role in this process.
arXiv Detail & Related papers (2021-04-15T16:30:31Z) - Dependency Parsing based Semantic Representation Learning with Graph
Neural Network for Enhancing Expressiveness of Text-to-Speech [49.05471750563229]
We propose a semantic representation learning method based on graph neural network, considering dependency relations of a sentence.
We show that our proposed method outperforms the baseline using vanilla BERT features both in LJSpeech and Bilzzard Challenge 2013 datasets.
arXiv Detail & Related papers (2021-04-14T13:09:51Z) - GraphSpeech: Syntax-Aware Graph Attention Network For Neural Speech
Synthesis [79.1885389845874]
Transformer-based end-to-end text-to-speech synthesis (TTS) is one of such successful implementations.
We propose a novel neural TTS model, denoted as GraphSpeech, that is formulated under graph neural network framework.
Experiments show that GraphSpeech consistently outperforms the Transformer TTS baseline in terms of spectrum and prosody rendering of utterances.
arXiv Detail & Related papers (2020-10-23T14:14:06Z) - Syntactic Structure Distillation Pretraining For Bidirectional Encoders [49.483357228441434]
We introduce a knowledge distillation strategy for injecting syntactic biases into BERT pretraining.
We distill the approximate marginal distribution over words in context from the syntactic LM.
Our findings demonstrate the benefits of syntactic biases, even in representation learners that exploit large amounts of data.
arXiv Detail & Related papers (2020-05-27T16:44:01Z) - Representations of Syntax [MASK] Useful: Effects of Constituency and
Dependency Structure in Recursive LSTMs [26.983602540576275]
Sequence-based neural networks show significant sensitivity to syntactic structure, but they still perform less well on syntactic tasks than tree-based networks.
We evaluate which of these two representational schemes more effectively introduces biases for syntactic structure.
We show that a constituency-based network generalizes more robustly than a dependency-based one, and that combining the two types of structure does not yield further improvement.
arXiv Detail & Related papers (2020-04-30T18:00:06Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.