Neural Speech Synthesis on a Shoestring: Improving the Efficiency of
LPCNet
- URL: http://arxiv.org/abs/2202.11169v1
- Date: Tue, 22 Feb 2022 20:42:00 GMT
- Title: Neural Speech Synthesis on a Shoestring: Improving the Efficiency of
LPCNet
- Authors: Jean-Marc Valin, Umut Isik, Paris Smaragdis, Arvindh Krishnaswamy
- Abstract summary: We improve the efficiency of LPCNet to make it usable on a wide variety of devices.
We demonstrate an improvement in synthesis quality while operating 2.5x faster.
The resulting open-source LPCNet algorithm can perform real-time neural synthesis on most existing phones and is even usable in some embedded devices.
- Score: 35.44634252321666
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Neural speech synthesis models can synthesize high quality speech but
typically require a high computational complexity to do so. In previous work,
we introduced LPCNet, which uses linear prediction to significantly reduce the
complexity of neural synthesis. In this work, we further improve the efficiency
of LPCNet -- targeting both algorithmic and computational improvements -- to
make it usable on a wide variety of devices. We demonstrate an improvement in
synthesis quality while operating 2.5x faster. The resulting open-source LPCNet
algorithm can perform real-time neural synthesis on most existing phones and is
even usable in some embedded devices.
Related papers
- CosyVoice 2: Scalable Streaming Speech Synthesis with Large Language Models [74.80386066714229]
We present an improved streaming speech synthesis model, CosyVoice 2.
Specifically, we introduce finite-scalar quantization to improve codebook utilization of speech tokens.
We develop a chunk-aware causal flow matching model to support various synthesis scenarios.
arXiv Detail & Related papers (2024-12-13T12:59:39Z) - Neuromorphic Wireless Split Computing with Multi-Level Spikes [69.73249913506042]
Neuromorphic computing uses spiking neural networks (SNNs) to perform inference tasks.
embedding a small payload within each spike exchanged between spiking neurons can enhance inference accuracy without increasing energy consumption.
split computing - where an SNN is partitioned across two devices - is a promising solution.
This paper presents the first comprehensive study of a neuromorphic wireless split computing architecture that employs multi-level SNNs.
arXiv Detail & Related papers (2024-11-07T14:08:35Z) - COOL: Efficient and Reliable Chain-Oriented Objective Logic with Neural Networks Feedback Control for Program Synthesis [0.0]
Chain of Logic (CoL) organizes synthesis process into an activity flow and provides control to guide the process.
Our approach modularizes synthesis and mitigates the impact of neural network mispredictions.
Experiments on relational and symbolic synthesis tasks show that CoL significantly enhances the efficiency and reliability of DSL program synthesis.
arXiv Detail & Related papers (2024-10-02T13:02:17Z) - INVICTUS: Optimizing Boolean Logic Circuit Synthesis via Synergistic
Learning and Search [18.558280701880136]
State-of-the-art logic synthesis algorithms have a large number of logic minimizations.
INVICTUS generates a sequence of logic minimizations based on a training dataset of previously seen designs.
arXiv Detail & Related papers (2023-05-22T15:50:42Z) - AISYN: AI-driven Reinforcement Learning-Based Logic Synthesis Framework [0.8356765961526955]
We believe that Artificial Intelligence (AI) and Reinforcement Learning (RL) algorithms can help in solving this problem.
Our experiments on both open source and industrial benchmark circuits show that significant improvements on important metrics such as area, delay, and power can be achieved by making logic synthesis optimization functions AI-driven.
arXiv Detail & Related papers (2023-02-08T00:55:24Z) - Machine-Learning-Optimized Perovskite Nanoplatelet Synthesis [55.41644538483948]
We develop an algorithm to improve the quality of CsPbBr3 nanoplatelets (NPLs) using only 200 total syntheses.
The algorithm can predict the resulting PL emission maxima of the NPL dispersions based on the precursor ratios.
arXiv Detail & Related papers (2022-10-18T11:54:11Z) - NeuralDPS: Neural Deterministic Plus Stochastic Model with Multiband
Excitation for Noise-Controllable Waveform Generation [67.96138567288197]
We propose a novel neural vocoder named NeuralDPS which can retain high speech quality and acquire high synthesis efficiency and noise controllability.
It generates waveforms at least 280 times faster than the WaveNet vocoder.
It is also 28% faster than WaveGAN's synthesis efficiency on a single core.
arXiv Detail & Related papers (2022-03-05T08:15:29Z) - An Adaptive Device-Edge Co-Inference Framework Based on Soft
Actor-Critic [72.35307086274912]
High-dimension parameter model and large-scale mathematical calculation restrict execution efficiency, especially for Internet of Things (IoT) devices.
We propose a new Deep Reinforcement Learning (DRL)-Soft Actor Critic for discrete (SAC-d), which generates the emphexit point, emphexit point, and emphcompressing bits by soft policy iterations.
Based on the latency and accuracy aware reward design, such an computation can well adapt to the complex environment like dynamic wireless channel and arbitrary processing, and is capable of supporting the 5G URL
arXiv Detail & Related papers (2022-01-09T09:31:50Z) - PAC-learning gains of Turing machines over circuits and neural networks [1.4502611532302039]
We study the potential gains in sample efficiency that can bring in the principle of minimum description length.
We use Turing machines to represent universal models and circuits.
We highlight close relationships between classical open problems in Circuit Complexity and the tightness of these.
arXiv Detail & Related papers (2021-03-23T17:03:10Z) - Bunched LPCNet : Vocoder for Low-cost Neural Text-To-Speech Systems [18.480490920718367]
LPCNet is an efficient vocoder that combines linear prediction and deep neural network modules to keep the computational complexity low.
We present two techniques to further reduce it's complexity, aiming for a low-cost LPCNet vocoder-based neural Text-to-Speech (TTS) System.
arXiv Detail & Related papers (2020-08-11T08:15:45Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.