EPIC TTS Models: Empirical Pruning Investigations Characterizing
Text-To-Speech Models
- URL: http://arxiv.org/abs/2209.10890v1
- Date: Thu, 22 Sep 2022 09:47:25 GMT
- Title: EPIC TTS Models: Empirical Pruning Investigations Characterizing
Text-To-Speech Models
- Authors: Perry Lam, Huayun Zhang, Nancy F. Chen, Berrak Sisman
- Abstract summary: This work compares sparsity paradigms in text-to-speech synthesis.
It is the first work that compares sparsity paradigms in text-to-speech synthesis.
- Score: 26.462819114575172
- License: http://creativecommons.org/licenses/by-sa/4.0/
- Abstract: Neural models are known to be over-parameterized, and recent work has shown
that sparse text-to-speech (TTS) models can outperform dense models. Although a
plethora of sparse methods has been proposed for other domains, such methods
have rarely been applied in TTS. In this work, we seek to answer the question:
what are the characteristics of selected sparse techniques on the performance
and model complexity? We compare a Tacotron2 baseline and the results of
applying five techniques. We then evaluate the performance via the factors of
naturalness, intelligibility and prosody, while reporting model size and
training time. Complementary to prior research, we find that pruning before or
during training can achieve similar performance to pruning after training and
can be trained much faster, while removing entire neurons degrades performance
much more than removing parameters. To our best knowledge, this is the first
work that compares sparsity paradigms in text-to-speech synthesis.
Related papers
- SimpleSpeech 2: Towards Simple and Efficient Text-to-Speech with Flow-based Scalar Latent Transformer Diffusion Models [64.40250409933752]
We build upon our previous publication by implementing a simple and efficient non-autoregressive (NAR) TTS framework, termed SimpleSpeech 2.
SimpleSpeech 2 effectively combines the strengths of both autoregressive (AR) and non-autoregressive (NAR) methods.
We show a significant improvement in generation performance and generation speed compared to our previous work and other state-of-the-art (SOTA) large-scale TTS models.
arXiv Detail & Related papers (2024-08-25T17:07:39Z) - Pruning Large Language Models with Semi-Structural Adaptive Sparse Training [17.381160429641316]
We propose a pruning pipeline for semi-structured sparse models via retraining, termed Adaptive Sparse Trainer (AST)
AST transforms dense models into sparse ones by applying decay to masked weights while allowing the model to adaptively select masks throughout the training process.
Our work demonstrates the feasibility of deploying semi-structured sparse large language models and introduces a novel method for achieving highly compressed models.
arXiv Detail & Related papers (2024-07-30T06:33:44Z) - Reducing Computational Costs in Sentiment Analysis: Tensorized Recurrent
Networks vs. Recurrent Networks [0.12891210250935145]
Anticipating audience reaction towards a certain text is integral to several facets of society ranging from politics, research, and commercial industries.
Sentiment analysis (SA) is a useful natural language processing (NLP) technique that utilizes lexical/statistical and deep learning methods to determine whether different-sized texts exhibit positive, negative, or neutral emotions.
arXiv Detail & Related papers (2023-06-16T09:18:08Z) - Improving Pre-trained Language Model Fine-tuning with Noise Stability
Regularization [94.4409074435894]
We propose a novel and effective fine-tuning framework, named Layerwise Noise Stability Regularization (LNSR)
Specifically, we propose to inject the standard Gaussian noise and regularize hidden representations of the fine-tuned model.
We demonstrate the advantages of the proposed method over other state-of-the-art algorithms including L2-SP, Mixout and SMART.
arXiv Detail & Related papers (2022-06-12T04:42:49Z) - MoEBERT: from BERT to Mixture-of-Experts via Importance-Guided
Adaptation [68.30497162547768]
We propose MoEBERT, which uses a Mixture-of-Experts structure to increase model capacity and inference speed.
We validate the efficiency and effectiveness of MoEBERT on natural language understanding and question answering tasks.
arXiv Detail & Related papers (2022-04-15T23:19:37Z) - An Exploration of Prompt Tuning on Generative Spoken Language Model for
Speech Processing Tasks [112.1942546460814]
We report the first exploration of the prompt tuning paradigm for speech processing tasks based on Generative Spoken Language Model (GSLM)
Experiment results show that the prompt tuning technique achieves competitive performance in speech classification tasks with fewer trainable parameters than fine-tuning specialized downstream models.
arXiv Detail & Related papers (2022-03-31T03:26:55Z) - Differentiable Duration Modeling for End-to-End Text-to-Speech [6.571447892202893]
parallel text-to-speech (TTS) models have recently enabled fast and highly-natural speech synthesis.
We propose a differentiable duration method for learning monotonic sequences between input and output.
Our model learns to perform high-fidelity synthesis through a combination of adversarial training and matching the total ground-truth duration.
arXiv Detail & Related papers (2022-03-21T15:14:44Z) - Compressing Sentence Representation for Semantic Retrieval via
Homomorphic Projective Distillation [28.432799973328127]
We propose Homomorphic Projective Distillation (HPD) to learn compressed sentence embeddings.
Our method augments a small Transformer encoder model with learnable projection layers to produce compact representations.
arXiv Detail & Related papers (2022-03-15T07:05:43Z) - On the Interplay Between Sparsity, Naturalness, Intelligibility, and
Prosody in Speech Synthesis [102.80458458550999]
We investigate the tradeoffs between sparstiy and its subsequent effects on synthetic speech.
Our findings suggest that not only are end-to-end TTS models highly prunable, but also, perhaps surprisingly, pruned TTS models can produce synthetic speech with equal or higher naturalness and intelligibility.
arXiv Detail & Related papers (2021-10-04T02:03:28Z) - Unnatural Language Inference [48.45003475966808]
We find that state-of-the-art NLI models, such as RoBERTa and BART, are invariant to, and sometimes even perform better on, examples with randomly reordered words.
Our findings call into question the idea that our natural language understanding models, and the tasks used for measuring their progress, genuinely require a human-like understanding of syntax.
arXiv Detail & Related papers (2020-12-30T20:40:48Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.