A Deterministic plus Stochastic Model of the Residual Signal for
Improved Parametric Speech Synthesis
- URL: http://arxiv.org/abs/2001.00842v1
- Date: Sun, 29 Dec 2019 07:26:47 GMT
- Title: A Deterministic plus Stochastic Model of the Residual Signal for
Improved Parametric Speech Synthesis
- Authors: Thomas Drugman, Geoffrey Wilfart, Thierry Dutoit
- Abstract summary: We propose an adaptation of the Deterministic plus Model (DSM) for the residual.
The proposed residual model is integrated within a HMM-based speech synthesizer.
Results show a significative improvement for both male and female voices.
- Score: 11.481208551940998
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Speech generated by parametric synthesizers generally suffers from a typical
buzziness, similar to what was encountered in old LPC-like vocoders. In order
to alleviate this problem, a more suited modeling of the excitation should be
adopted. For this, we hereby propose an adaptation of the Deterministic plus
Stochastic Model (DSM) for the residual. In this model, the excitation is
divided into two distinct spectral bands delimited by the maximum voiced
frequency. The deterministic part concerns the low-frequency contents and
consists of a decomposition of pitch-synchronous residual frames on an
orthonormal basis obtained by Principal Component Analysis. The stochastic
component is a high-pass filtered noise whose time structure is modulated by an
energy-envelope, similarly to what is done in the Harmonic plus Noise Model
(HNM). The proposed residual model is integrated within a HMM-based speech
synthesizer and is compared to the traditional excitation through a subjective
test. Results show a significative improvement for both male and female voices.
In addition the proposed model requires few computational load and memory,
which is essential for its integration in commercial applications.
Related papers
- SMRD: SURE-based Robust MRI Reconstruction with Diffusion Models [76.43625653814911]
Diffusion models have gained popularity for accelerated MRI reconstruction due to their high sample quality.
They can effectively serve as rich data priors while incorporating the forward model flexibly at inference time.
We introduce SURE-based MRI Reconstruction with Diffusion models (SMRD) to enhance robustness during testing.
arXiv Detail & Related papers (2023-10-03T05:05:35Z) - High-Fidelity Speech Synthesis with Minimal Supervision: All Using
Diffusion Models [56.00939852727501]
Minimally-supervised speech synthesis decouples TTS by combining two types of discrete speech representations.
Non-autoregressive framework enhances controllability, and duration diffusion model enables diversified prosodic expression.
arXiv Detail & Related papers (2023-09-27T09:27:03Z) - An Optimization-based Deep Equilibrium Model for Hyperspectral Image
Deconvolution with Convergence Guarantees [71.57324258813675]
We propose a novel methodology for addressing the hyperspectral image deconvolution problem.
A new optimization problem is formulated, leveraging a learnable regularizer in the form of a neural network.
The derived iterative solver is then expressed as a fixed-point calculation problem within the Deep Equilibrium framework.
arXiv Detail & Related papers (2023-06-10T08:25:16Z) - Self-Adapting Noise-Contrastive Estimation for Energy-Based Models [0.0]
Training energy-based models with noise-contrastive estimation (NCE) is theoretically feasible but practically challenging.
Previous works have explored modelling the noise distribution as a separate generative model, and then concurrently training this noise model with the EBM.
This thesis proposes a self-adapting NCE algorithm which uses static instances of the EBM along its training trajectory as the noise distribution.
arXiv Detail & Related papers (2022-11-03T15:17:43Z) - Period VITS: Variational Inference with Explicit Pitch Modeling for
End-to-end Emotional Speech Synthesis [19.422230767803246]
We propose Period VITS, a novel end-to-end text-to-speech model that incorporates an explicit periodicity generator.
In the proposed method, we introduce a frame pitch predictor that predicts prosodic features, such as pitch and voicing flags, from the input text.
From these features, the proposed periodicity generator produces a sample-level sinusoidal source that enables the waveform decoder to accurately reproduce the pitch.
arXiv Detail & Related papers (2022-10-28T07:52:30Z) - SpecGrad: Diffusion Probabilistic Model based Neural Vocoder with
Adaptive Noise Spectral Shaping [51.698273019061645]
SpecGrad adapts the diffusion noise so that its time-varying spectral envelope becomes close to the conditioning log-mel spectrogram.
It is processed in the time-frequency domain to keep the computational cost almost the same as the conventional DDPM-based neural vocoders.
arXiv Detail & Related papers (2022-03-31T02:08:27Z) - Discretization and Re-synthesis: an alternative method to solve the
Cocktail Party Problem [65.25725367771075]
This study demonstrates, for the first time, that the synthesis-based approach can also perform well on this problem.
Specifically, we propose a novel speech separation/enhancement model based on the recognition of discrete symbols.
By utilizing the synthesis model with the input of discrete symbols, after the prediction of discrete symbol sequence, each target speech could be re-synthesized.
arXiv Detail & Related papers (2021-12-17T08:35:40Z) - FastPitchFormant: Source-filter based Decomposed Modeling for Speech
Synthesis [6.509758931804479]
We propose a feed-forward Transformer based TTS model that is designed based on the source-filter theory.
FastPitchFormant has a unique structure that handles text and acoustic features in parallel.
arXiv Detail & Related papers (2021-06-29T07:06:42Z) - VaPar Synth -- A Variational Parametric Model for Audio Synthesis [78.3405844354125]
We present VaPar Synth - a Variational Parametric Synthesizer which utilizes a conditional variational autoencoder (CVAE) trained on a suitable parametric representation.
We demonstrate our proposed model's capabilities via the reconstruction and generation of instrumental tones with flexible control over their pitch.
arXiv Detail & Related papers (2020-03-30T16:05:47Z) - Eigenresiduals for improved Parametric Speech Synthesis [11.481208551940998]
A new excitation model is proposed to produce natural-sounding voices in a speech synthesizer.
The model is based on the decomposition of pitch-synchronous residual frames on an orthonormal basis.
A stream of PCA-based coefficients is added to our HMM-based synthesizer and allows to generate the voiced excitation during the synthesis.
arXiv Detail & Related papers (2020-01-02T09:39:07Z) - The Deterministic plus Stochastic Model of the Residual Signal and its
Applications [13.563526970105988]
This manuscript presents a Deterministic plus Model (DSM) of the residual signal.
The applicability of the DSM in two fields of speech processing is then studied.
arXiv Detail & Related papers (2019-12-29T07:52:37Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.