Hierarchical Multi-Grained Generative Model for Expressive Speech
Synthesis
- URL: http://arxiv.org/abs/2009.08474v2
- Date: Sun, 26 Dec 2021 08:42:16 GMT
- Title: Hierarchical Multi-Grained Generative Model for Expressive Speech
Synthesis
- Authors: Yukiya Hono, Kazuna Tsuboi, Kei Sawada, Kei Hashimoto, Keiichiro Oura,
Yoshihiko Nankaku, Keiichi Tokuda
- Abstract summary: This paper proposes a hierarchical generative model with a multi-grained latent variable to synthesize expressive speech.
Our proposed framework also provides the controllability of speaking style in an entire utterance.
- Score: 19.386519810463003
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: This paper proposes a hierarchical generative model with a multi-grained
latent variable to synthesize expressive speech. In recent years, fine-grained
latent variables are introduced into the text-to-speech synthesis that enable
the fine control of the prosody and speaking styles of synthesized speech.
However, the naturalness of speech degrades when these latent variables are
obtained by sampling from the standard Gaussian prior. To solve this problem,
we propose a novel framework for modeling the fine-grained latent variables,
considering the dependence on an input text, a hierarchical linguistic
structure, and a temporal structure of latent variables. This framework
consists of a multi-grained variational autoencoder, a conditional prior, and a
multi-level auto-regressive latent converter to obtain the different
time-resolution latent variables and sample the finer-level latent variables
from the coarser-level ones by taking into account the input text. Experimental
results indicate an appropriate method of sampling fine-grained latent
variables without the reference signal at the synthesis stage. Our proposed
framework also provides the controllability of speaking style in an entire
utterance.
Related papers
- Hierarchical Sketch Induction for Paraphrase Generation [79.87892048285819]
We introduce Hierarchical Refinement Quantized Variational Autoencoders (HRQ-VAE), a method for learning decompositions of dense encodings.
We use HRQ-VAE to encode the syntactic form of an input sentence as a path through the hierarchy, allowing us to more easily predict syntactic sketches at test time.
arXiv Detail & Related papers (2022-03-07T15:28:36Z) - DiscoDVT: Generating Long Text with Discourse-Aware Discrete Variational
Transformer [40.10695204278747]
We propose DiscoDVT, a discourse-aware discrete variational Transformer to tackle the incoherence issue.
We conduct extensive experiments on two open story generation datasets and demonstrate that the latent codes learn meaningful correspondence to the discourse structures that guide the model to generate long texts with better long-range coherence.
arXiv Detail & Related papers (2021-10-12T13:41:06Z) - Disentangling Generative Factors in Natural Language with Discrete
Variational Autoencoders [0.0]
We argue that continuous variables may not be ideal to model features of textual data, due to the fact that most generative factors in text are discrete.
We propose a Variational Autoencoder based method which models language features as discrete variables and encourages independence between variables for learning disentangled representations.
arXiv Detail & Related papers (2021-09-15T09:10:05Z) - Towards Multi-Scale Style Control for Expressive Speech Synthesis [60.08928435252417]
The proposed method employs a multi-scale reference encoder to extract both the global-scale utterance-level and the local-scale quasi-phoneme-level style features of the target speech.
During training time, the multi-scale style model could be jointly trained with the speech synthesis model in an end-to-end fashion.
arXiv Detail & Related papers (2021-04-08T05:50:09Z) - Diverse Semantic Image Synthesis via Probability Distribution Modeling [103.88931623488088]
We propose a novel diverse semantic image synthesis framework.
Our method can achieve superior diversity and comparable quality compared to state-of-the-art methods.
arXiv Detail & Related papers (2021-03-11T18:59:25Z) - Neural Syntactic Preordering for Controlled Paraphrase Generation [57.5316011554622]
Our work uses syntactic transformations to softly "reorder'' the source sentence and guide our neural paraphrasing model.
First, given an input sentence, we derive a set of feasible syntactic rearrangements using an encoder-decoder model.
Next, we use each proposed rearrangement to produce a sequence of position embeddings, which encourages our final encoder-decoder paraphrase model to attend to the source words in a particular order.
arXiv Detail & Related papers (2020-05-05T09:02:25Z) - Improve Variational Autoencoder for Text Generationwith Discrete Latent
Bottleneck [52.08901549360262]
Variational autoencoders (VAEs) are essential tools in end-to-end representation learning.
VAEs tend to ignore latent variables with a strong auto-regressive decoder.
We propose a principled approach to enforce an implicit latent feature matching in a more compact latent space.
arXiv Detail & Related papers (2020-04-22T14:41:37Z) - Syntax-driven Iterative Expansion Language Models for Controllable Text
Generation [2.578242050187029]
We propose a new paradigm for introducing a syntactic inductive bias into neural text generation.
Our experiments show that this paradigm is effective at text generation, with quality between LSTMs and Transformers, and comparable diversity.
arXiv Detail & Related papers (2020-04-05T14:29:40Z) - Generating diverse and natural text-to-speech samples using a quantized
fine-grained VAE and auto-regressive prosody prior [53.69310441063162]
This paper proposes a sequential prior in a discrete latent space which can generate more naturally sounding samples.
We evaluate the approach using listening tests, objective metrics of automatic speech recognition (ASR) performance, and measurements of prosody attributes.
arXiv Detail & Related papers (2020-02-06T12:35:50Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.