GraphPB: Graphical Representations of Prosody Boundary in Speech
Synthesis
- URL: http://arxiv.org/abs/2012.02626v1
- Date: Thu, 3 Dec 2020 03:34:05 GMT
- Title: GraphPB: Graphical Representations of Prosody Boundary in Speech
Synthesis
- Authors: Aolan Sun, Jianzong Wang, Ning Cheng, Huayi Peng, Zhen Zeng, Lingwei
Kong, Jing Xiao
- Abstract summary: This paper introduces a graphical representation approach of prosody boundary (GraphPB) in the task of Chinese speech synthesis.
The nodes of the graph embedding are formed by prosodic words, and the edges are formed by the other prosodic boundaries.
Two techniques are proposed to embed sequential information into the graph-to-sequence text-to-speech model.
- Score: 23.836992815219904
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: This paper introduces a graphical representation approach of prosody boundary
(GraphPB) in the task of Chinese speech synthesis, intending to parse the
semantic and syntactic relationship of input sequences in a graphical domain
for improving the prosody performance. The nodes of the graph embedding are
formed by prosodic words, and the edges are formed by the other prosodic
boundaries, namely prosodic phrase boundary (PPH) and intonation phrase
boundary (IPH). Different Graph Neural Networks (GNN) like Gated Graph Neural
Network (GGNN) and Graph Long Short-term Memory (G-LSTM) are utilised as graph
encoders to exploit the graphical prosody boundary information.
Graph-to-sequence model is proposed and formed by a graph encoder and an
attentional decoder. Two techniques are proposed to embed sequential
information into the graph-to-sequence text-to-speech model. The experimental
results show that this proposed approach can encode the phonetic and prosody
rhythm of an utterance. The mean opinion score (MOS) of these GNN models shows
comparative results with the state-of-the-art sequence-to-sequence models with
better performance in the aspect of prosody. This provides an alternative
approach for prosody modelling in end-to-end speech synthesis.
Related papers
- From Pixels to Graphs: Open-Vocabulary Scene Graph Generation with Vision-Language Models [81.92098140232638]
Scene graph generation (SGG) aims to parse a visual scene into an intermediate graph representation for downstream reasoning tasks.
Existing methods struggle to generate scene graphs with novel visual relation concepts.
We introduce a new open-vocabulary SGG framework based on sequence generation.
arXiv Detail & Related papers (2024-04-01T04:21:01Z) - Message Detouring: A Simple Yet Effective Cycle Representation for
Expressive Graph Learning [4.085624738017079]
We introduce the concept of textitmessage detouring to hierarchically characterize cycle representation throughout the entire graph.
Message detouring can significantly outperform current counterpart approaches on various benchmark datasets.
arXiv Detail & Related papers (2024-02-12T22:06:37Z) - Text Enriched Sparse Hyperbolic Graph Convolutional Networks [21.83127488157701]
Graph Neural Networks (GNNs) and their hyperbolic variants provide a promising approach to encode such networks in a low-dimensional latent space.
We propose Text Enriched Sparse Hyperbolic Graph Convolution Network (TESH-GCN) to capture the graph's metapath structures using semantic signals.
Our model outperforms the current state-of-the-art approaches by a large margin on the task of link prediction.
arXiv Detail & Related papers (2022-07-06T00:23:35Z) - Graph Condensation via Receptive Field Distribution Matching [61.71711656856704]
This paper focuses on creating a small graph to represent the original graph, so that GNNs trained on the size-reduced graph can make accurate predictions.
We view the original graph as a distribution of receptive fields and aim to synthesize a small graph whose receptive fields share a similar distribution.
arXiv Detail & Related papers (2022-06-28T02:10:05Z) - Towards Graph Self-Supervised Learning with Contrastive Adjusted Zooming [48.99614465020678]
We introduce a novel self-supervised graph representation learning algorithm via Graph Contrastive Adjusted Zooming.
This mechanism enables G-Zoom to explore and extract self-supervision signals from a graph from multiple scales.
We have conducted extensive experiments on real-world datasets, and the results demonstrate that our proposed model outperforms state-of-the-art methods consistently.
arXiv Detail & Related papers (2021-11-20T22:45:53Z) - A Robust and Generalized Framework for Adversarial Graph Embedding [73.37228022428663]
We propose a robust framework for adversarial graph embedding, named AGE.
AGE generates the fake neighbor nodes as the enhanced negative samples from the implicit distribution.
Based on this framework, we propose three models to handle three types of graph data.
arXiv Detail & Related papers (2021-05-22T07:05:48Z) - GraphSVX: Shapley Value Explanations for Graph Neural Networks [81.83769974301995]
Graph Neural Networks (GNNs) achieve significant performance for various learning tasks on geometric data.
In this paper, we propose a unified framework satisfied by most existing GNN explainers.
We introduce GraphSVX, a post hoc local model-agnostic explanation method specifically designed for GNNs.
arXiv Detail & Related papers (2021-04-18T10:40:37Z) - Neural Topic Modeling by Incorporating Document Relationship Graph [18.692100955163713]
Graph Topic Model (GTM) is a GNN based neural topic model that represents a corpus as a document relationship graph.
Documents and words in the corpus become nodes in the graph and are connected based on document-word co-occurrences.
arXiv Detail & Related papers (2020-09-29T12:45:55Z) - Compact Graph Architecture for Speech Emotion Recognition [0.0]
A compact, efficient and scalable way to represent data is in the form of graphs.
We construct a Graph Convolution Network (GCN)-based architecture that can perform an accurate graph convolution.
Our model achieves comparable performance to the state-of-the-art with significantly fewer learnable parameters.
arXiv Detail & Related papers (2020-08-05T12:09:09Z) - Graph Pooling with Node Proximity for Hierarchical Representation
Learning [80.62181998314547]
We propose a novel graph pooling strategy that leverages node proximity to improve the hierarchical representation learning of graph data with their multi-hop topology.
Results show that the proposed graph pooling strategy is able to achieve state-of-the-art performance on a collection of public graph classification benchmark datasets.
arXiv Detail & Related papers (2020-06-19T13:09:44Z) - GraphTTS: graph-to-sequence modelling in neural text-to-speech [34.54061333255853]
This paper leverages the graph-to-sequence method in neural text-to-speech (GraphTTS)
It maps the graph embedding of the input sequence to spectrograms.
Applying the encoder of GraphTTS as a graph auxiliary encoder (GAE) can analyse prosody information from the semantic structure of texts.
arXiv Detail & Related papers (2020-03-04T07:44:55Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.