Bunched LPCNet : Vocoder for Low-cost Neural Text-To-Speech Systems
- URL: http://arxiv.org/abs/2008.04574v1
- Date: Tue, 11 Aug 2020 08:15:45 GMT
- Title: Bunched LPCNet : Vocoder for Low-cost Neural Text-To-Speech Systems
- Authors: Ravichander Vipperla, Sangjun Park, Kihyun Choo, Samin Ishtiaq,
Kyoungbo Min, Sourav Bhattacharya, Abhinav Mehrotra, Alberto Gil C. P. Ramos
and Nicholas D. Lane
- Abstract summary: LPCNet is an efficient vocoder that combines linear prediction and deep neural network modules to keep the computational complexity low.
We present two techniques to further reduce it's complexity, aiming for a low-cost LPCNet vocoder-based neural Text-to-Speech (TTS) System.
- Score: 18.480490920718367
- License: http://creativecommons.org/licenses/by-nc-sa/4.0/
- Abstract: LPCNet is an efficient vocoder that combines linear prediction and deep
neural network modules to keep the computational complexity low. In this work,
we present two techniques to further reduce it's complexity, aiming for a
low-cost LPCNet vocoder-based neural Text-to-Speech (TTS) System. These
techniques are: 1) Sample-bunching, which allows LPCNet to generate more than
one audio sample per inference; and 2) Bit-bunching, which reduces the
computations in the final layer of LPCNet. With the proposed bunching
techniques, LPCNet, in conjunction with a Deep Convolutional TTS (DCTTS)
acoustic model, shows a 2.19x improvement over the baseline run-time when
running on a mobile device, with a less than 0.1 decrease in TTS mean opinion
score (MOS).
Related papers
- EM-TTS: Efficiently Trained Low-Resource Mongolian Lightweight Text-to-Speech [4.91849983180793]
We propose a lightweight Text-to-Speech (TTS) system based on deep convolutional neural networks.
Our model consists of two stages: Text2Spectrum and SSRN.
Experiments show that our model can reduce the training time and parameters while ensuring the quality and naturalness of the synthesized speech.
arXiv Detail & Related papers (2024-03-13T01:27:57Z) - Unsupervised Deep Unfolded PGD for Transmit Power Allocation in Wireless
Systems [0.6091702876917281]
We propose a simple low-complexity TPC algorithm based on the deep unfolding of the iterative projected gradient (PGD) algorithm into layers of a deep neural network and learning the step-size parameters.
Performance evaluation in dense device-to-device (D2D) communication scenarios showed that the proposed method can achieve better performance than the iterative algorithm with more than a factor of 2 lower number of iterations.
arXiv Detail & Related papers (2023-06-20T19:51:21Z) - Efficient Dataset Distillation Using Random Feature Approximation [109.07737733329019]
We propose a novel algorithm that uses a random feature approximation (RFA) of the Neural Network Gaussian Process (NNGP) kernel.
Our algorithm provides at least a 100-fold speedup over KIP and can run on a single GPU.
Our new method, termed an RFA Distillation (RFAD), performs competitively with KIP and other dataset condensation algorithms in accuracy over a range of large-scale datasets.
arXiv Detail & Related papers (2022-10-21T15:56:13Z) - Simple Pooling Front-ends For Efficient Audio Classification [56.59107110017436]
We show that eliminating the temporal redundancy in the input audio features could be an effective approach for efficient audio classification.
We propose a family of simple pooling front-ends (SimPFs) which use simple non-parametric pooling operations to reduce the redundant information.
SimPFs can achieve a reduction in more than half of the number of floating point operations for off-the-shelf audio neural networks.
arXiv Detail & Related papers (2022-10-03T14:00:41Z) - Bunched LPCNet2: Efficient Neural Vocoders Covering Devices from Cloud
to Edge [3.612475016403612]
Bunched LPCNet2 provides efficient performance in high-quality for cloud servers and in a low-complexity for low-resource edge devices.
Experiments demonstrate that Bunched LPCNet2 generates satisfactory speech quality with a model footprint of 1.1MB while operating faster than real-time on a RPi 3B.
arXiv Detail & Related papers (2022-03-27T23:56:52Z) - NeuralDPS: Neural Deterministic Plus Stochastic Model with Multiband
Excitation for Noise-Controllable Waveform Generation [67.96138567288197]
We propose a novel neural vocoder named NeuralDPS which can retain high speech quality and acquire high synthesis efficiency and noise controllability.
It generates waveforms at least 280 times faster than the WaveNet vocoder.
It is also 28% faster than WaveGAN's synthesis efficiency on a single core.
arXiv Detail & Related papers (2022-03-05T08:15:29Z) - Neural Speech Synthesis on a Shoestring: Improving the Efficiency of
LPCNet [35.44634252321666]
We improve the efficiency of LPCNet to make it usable on a wide variety of devices.
We demonstrate an improvement in synthesis quality while operating 2.5x faster.
The resulting open-source LPCNet algorithm can perform real-time neural synthesis on most existing phones and is even usable in some embedded devices.
arXiv Detail & Related papers (2022-02-22T20:42:00Z) - A Study of Designing Compact Audio-Visual Wake Word Spotting System
Based on Iterative Fine-Tuning in Neural Network Pruning [57.28467469709369]
We investigate on designing a compact audio-visual wake word spotting (WWS) system by utilizing visual information.
We introduce a neural network pruning strategy via the lottery ticket hypothesis in an iterative fine-tuning manner (LTH-IF)
The proposed audio-visual system achieves significant performance improvements over the single-modality (audio-only or video-only) system under different noisy conditions.
arXiv Detail & Related papers (2022-02-17T08:26:25Z) - Instant Neural Graphics Primitives with a Multiresolution Hash Encoding [67.33850633281803]
We present a versatile new input encoding that permits the use of a smaller network without sacrificing quality.
A small neural network is augmented by a multiresolution hash table of trainable feature vectors whose values are optimized through a gradient descent.
We achieve a combined speed of several orders of magnitude, enabling training of high-quality neural graphics primitives in a matter of seconds.
arXiv Detail & Related papers (2022-01-16T07:22:47Z) - An Adaptive Device-Edge Co-Inference Framework Based on Soft
Actor-Critic [72.35307086274912]
High-dimension parameter model and large-scale mathematical calculation restrict execution efficiency, especially for Internet of Things (IoT) devices.
We propose a new Deep Reinforcement Learning (DRL)-Soft Actor Critic for discrete (SAC-d), which generates the emphexit point, emphexit point, and emphcompressing bits by soft policy iterations.
Based on the latency and accuracy aware reward design, such an computation can well adapt to the complex environment like dynamic wireless channel and arbitrary processing, and is capable of supporting the 5G URL
arXiv Detail & Related papers (2022-01-09T09:31:50Z) - Scalable and Efficient Neural Speech Coding [24.959825692325445]
This work presents a scalable and efficient neural waveform (NWC) for speech compression.
The proposed CNN autoencoder also defines quantization and coding as a trainable module.
Compared to the other autoregressive decoder-based neural speech, our decoder has significantly smaller architecture.
arXiv Detail & Related papers (2021-03-27T00:10:16Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.