Related papers: Utterance-level Sequential Modeling For Deep Gaussian Process Based Speech Synthesis Using Simple Recurrent Unit

Utterance-level Sequential Modeling For Deep Gaussian Process Based Speech Synthesis Using Simple Recurrent Unit

URL: http://arxiv.org/abs/2004.10823v1
Date: Wed, 22 Apr 2020 19:51:36 GMT
Title: Utterance-level Sequential Modeling For Deep Gaussian Process Based Speech Synthesis Using Simple Recurrent Unit
Authors: Tomoki Koriyama, Hiroshi Saruwatari
Abstract summary: We show that DGP can be applied to utterance-level modeling using recurrent architecture models. We adopt a simple recurrent unit (SRU) for the proposed model to achieve a recurrent architecture. The proposed SRU-DGP-based speech synthesis outperforms not only feed-forward DGP but also automatically tuned SRU- and long short-term memory (LSTM)-based neural networks.
Score: 41.85906379846473
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: This paper presents a deep Gaussian process (DGP) model with a recurrent architecture for speech sequence modeling. DGP is a Bayesian deep model that can be trained effectively with the consideration of model complexity and is a kernel regression model that can have high expressibility. In the previous studies, it was shown that the DGP-based speech synthesis outperformed neural network-based one, in which both models used a feed-forward architecture. To improve the naturalness of synthetic speech, in this paper, we show that DGP can be applied to utterance-level modeling using recurrent architecture models. We adopt a simple recurrent unit (SRU) for the proposed model to achieve a recurrent architecture, in which we can execute fast speech parameter generation by using the high parallelization nature of SRU. The objective and subjective evaluation results show that the proposed SRU-DGP-based speech synthesis outperforms not only feed-forward DGP but also automatically tuned SRU- and long short-term memory (LSTM)-based neural networks.

Related papers

Instruction-Guided Autoregressive Neural Network Parameter Generation [49.800239140036496]
We propose IGPG, an autoregressive framework that unifies parameter synthesis across diverse tasks and architectures. By autoregressively generating neural network weights' tokens, IGPG ensures inter-layer coherence and enables efficient adaptation across models and datasets. Experiments on multiple datasets demonstrate that IGPG consolidates diverse pretrained models into a single, flexible generative framework.
arXiv Detail & Related papers (2025-04-02T05:50:19Z)
Scaling Pre-trained Language Models to Deeper via Parameter-efficient Architecture [68.13678918660872]
We design a more capable parameter-sharing architecture based on matrix product operator (MPO) MPO decomposition can reorganize and factorize the information of a parameter matrix into two parts. Our architecture shares the central tensor across all layers for reducing the model size.
arXiv Detail & Related papers (2023-03-27T02:34:09Z)
NAR-Former: Neural Architecture Representation Learning towards Holistic Attributes Prediction [37.357949900603295]
We propose a neural architecture representation model that can be used to estimate attributes holistically. Experiment results show that our proposed framework can be used to predict the latency and accuracy attributes of both cell architectures and whole deep neural networks.
arXiv Detail & Related papers (2022-11-15T10:15:21Z)
Toward an Over-parameterized Direct-Fit Model of Visual Perception [5.4823225815317125]
In this paper, we highlight the difference in parallel and sequential binding mechanisms between simple and complex cells. A new proposal for abstracting them into space partitioning and composition is developed. We show how it leads to a dynamic programming (DP)-like approximate nearest-neighbor search based on $ell_infty$-optimization.
arXiv Detail & Related papers (2022-10-07T23:54:30Z)
Adversarial Audio Synthesis with Complex-valued Polynomial Networks [60.231877895663956]
Time-frequency (TF) representations in audio have been increasingly modeled real-valued networks. We introduce complex-valued networks called APOLLO, that integrate such complex-valued representations in a natural way. APOLLO results in $17.5%$ improvement over adversarial methods and $8.2%$ over the state-of-the-art diffusion models on SC09 in audio generation.
arXiv Detail & Related papers (2022-06-14T12:58:59Z)
Re-parameterizing Your Optimizers rather than Architectures [119.08740698936633]
We propose a novel paradigm of incorporating model-specific prior knowledge into Structurals and using them to train generic (simple) models. As an implementation, we propose a novel methodology to add prior knowledge by modifying the gradients according to a set of model-specific hyper- parameters. For a simple model trained with a Repr, we focus on a VGG-style plain model and showcase that such a simple model trained with a Repr, which is referred to as Rep-VGG, performs on par with the recent well-designed models.
arXiv Detail & Related papers (2022-05-30T16:55:59Z)
Guided Sampling-based Evolutionary Deep Neural Network for Intelligent Fault Diagnosis [8.92307560991779]
We have proposed a novel framework of evolutionary deep neural network which uses policy gradient to guide the evolution of model architecture. The effectiveness of the proposed framework has been validated on three datasets.
arXiv Detail & Related papers (2021-11-12T18:59:45Z)
Sparse Flows: Pruning Continuous-depth Models [107.98191032466544]
We show that pruning improves generalization for neural ODEs in generative modeling. We also show that pruning finds minimal and efficient neural ODE representations with up to 98% less parameters compared to the original network, without loss of accuracy.
arXiv Detail & Related papers (2021-06-24T01:40:17Z)
Self-Learning for Received Signal Strength Map Reconstruction with Neural Architecture Search [63.39818029362661]
We present a model based on Neural Architecture Search (NAS) and self-learning for received signal strength ( RSS) map reconstruction. The approach first finds an optimal NN architecture and simultaneously train the deduced model over some ground-truth measurements of a given ( RSS) map. Experimental results show that signal predictions of this second model outperforms non-learning based state-of-the-art techniques and NN models with no architecture search.
arXiv Detail & Related papers (2021-05-17T12:19:22Z)
Compressing LSTM Networks by Matrix Product Operators [7.395226141345625]
Long Short Term Memory(LSTM) models are the building blocks of many state-of-the-art natural language processing(NLP) and speech enhancement(SE) algorithms. Here we introduce the MPO decomposition, which describes the local correlation of quantum states in quantum many-body physics. We propose a matrix product operator(MPO) based neural network architecture to replace the LSTM model.
arXiv Detail & Related papers (2020-12-22T11:50:06Z)
Learning of Discrete Graphical Models with Neural Networks [15.171938155576566]
We introduce NeurISE, a neural net based algorithm for graphical model learning. NeurISE is seen to be a better alternative to GRISE when the energy function of the true model has a high order. We also show a variant of NeurISE that can be used to learn a neural net representation for the full energy function of the true model.
arXiv Detail & Related papers (2020-06-21T23:34:01Z)

This list is automatically generated from the titles and abstracts of the papers in this site.