Related papers: Neural Speech Synthesis on a Shoestring: Improving the Efficiency of LPCNet

Neural Speech Synthesis on a Shoestring: Improving the Efficiency of LPCNet

URL: http://arxiv.org/abs/2202.11169v1
Date: Tue, 22 Feb 2022 20:42:00 GMT
Title: Neural Speech Synthesis on a Shoestring: Improving the Efficiency of LPCNet
Authors: Jean-Marc Valin, Umut Isik, Paris Smaragdis, Arvindh Krishnaswamy
Abstract summary: We improve the efficiency of LPCNet to make it usable on a wide variety of devices. We demonstrate an improvement in synthesis quality while operating 2.5x faster. The resulting open-source LPCNet algorithm can perform real-time neural synthesis on most existing phones and is even usable in some embedded devices.
Score: 35.44634252321666
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Neural speech synthesis models can synthesize high quality speech but typically require a high computational complexity to do so. In previous work, we introduced LPCNet, which uses linear prediction to significantly reduce the complexity of neural synthesis. In this work, we further improve the efficiency of LPCNet -- targeting both algorithmic and computational improvements -- to make it usable on a wide variety of devices. We demonstrate an improvement in synthesis quality while operating 2.5x faster. The resulting open-source LPCNet algorithm can perform real-time neural synthesis on most existing phones and is even usable in some embedded devices.

Related papers

Model-free front-to-end training of a large high performance laser neural network [0.0]
We demonstrate a fully autonomous and parallel optical neural network (ONN) using off-the-shelf components. Our ONN is highly efficient and is scalable both in network size and inference bandwidth towards the GHz range. We show that our ONN can achieve high accuracy and convergence efficiency, even under limited hardware resources.
arXiv Detail & Related papers (2025-03-21T08:43:02Z)
CosyVoice 2: Scalable Streaming Speech Synthesis with Large Language Models [74.80386066714229]
We present an improved streaming speech synthesis model, CosyVoice 2. Specifically, we introduce finite-scalar quantization to improve codebook utilization of speech tokens. We develop a chunk-aware causal flow matching model to support various synthesis scenarios.
arXiv Detail & Related papers (2024-12-13T12:59:39Z)
Neuromorphic Wireless Split Computing with Multi-Level Spikes [69.73249913506042]
Neuromorphic computing uses spiking neural networks (SNNs) to perform inference tasks. embedding a small payload within each spike exchanged between spiking neurons can enhance inference accuracy without increasing energy consumption. split computing - where an SNN is partitioned across two devices - is a promising solution. This paper presents the first comprehensive study of a neuromorphic wireless split computing architecture that employs multi-level SNNs.
arXiv Detail & Related papers (2024-11-07T14:08:35Z)
COOL: Efficient and Reliable Chain-Oriented Objective Logic with Neural Networks Feedback Control for Program Synthesis [0.0]
Chain of Logic (CoL) organizes synthesis stages into a chain and provides precise control to guide the synthesis process. Our approach modularizes synthesis and mitigates the impact of neural network mispredictions.
arXiv Detail & Related papers (2024-10-02T13:02:17Z)
INVICTUS: Optimizing Boolean Logic Circuit Synthesis via Synergistic Learning and Search [18.558280701880136]
State-of-the-art logic synthesis algorithms have a large number of logic minimizations. INVICTUS generates a sequence of logic minimizations based on a training dataset of previously seen designs.
arXiv Detail & Related papers (2023-05-22T15:50:42Z)
AISYN: AI-driven Reinforcement Learning-Based Logic Synthesis Framework [0.8356765961526955]
We believe that Artificial Intelligence (AI) and Reinforcement Learning (RL) algorithms can help in solving this problem. Our experiments on both open source and industrial benchmark circuits show that significant improvements on important metrics such as area, delay, and power can be achieved by making logic synthesis optimization functions AI-driven.
arXiv Detail & Related papers (2023-02-08T00:55:24Z)
Machine-Learning-Optimized Perovskite Nanoplatelet Synthesis [55.41644538483948]
We develop an algorithm to improve the quality of CsPbBr3 nanoplatelets (NPLs) using only 200 total syntheses. The algorithm can predict the resulting PL emission maxima of the NPL dispersions based on the precursor ratios.
arXiv Detail & Related papers (2022-10-18T11:54:11Z)
NeuralDPS: Neural Deterministic Plus Stochastic Model with Multiband Excitation for Noise-Controllable Waveform Generation [67.96138567288197]
We propose a novel neural vocoder named NeuralDPS which can retain high speech quality and acquire high synthesis efficiency and noise controllability. It generates waveforms at least 280 times faster than the WaveNet vocoder. It is also 28% faster than WaveGAN's synthesis efficiency on a single core.
arXiv Detail & Related papers (2022-03-05T08:15:29Z)
An Adaptive Device-Edge Co-Inference Framework Based on Soft Actor-Critic [72.35307086274912]
High-dimension parameter model and large-scale mathematical calculation restrict execution efficiency, especially for Internet of Things (IoT) devices. We propose a new Deep Reinforcement Learning (DRL)-Soft Actor Critic for discrete (SAC-d), which generates the emphexit point, emphexit point, and emphcompressing bits by soft policy iterations. Based on the latency and accuracy aware reward design, such an computation can well adapt to the complex environment like dynamic wireless channel and arbitrary processing, and is capable of supporting the 5G URL
arXiv Detail & Related papers (2022-01-09T09:31:50Z)
PAC-learning gains of Turing machines over circuits and neural networks [1.4502611532302039]
We study the potential gains in sample efficiency that can bring in the principle of minimum description length. We use Turing machines to represent universal models and circuits. We highlight close relationships between classical open problems in Circuit Complexity and the tightness of these.
arXiv Detail & Related papers (2021-03-23T17:03:10Z)
Bunched LPCNet : Vocoder for Low-cost Neural Text-To-Speech Systems [18.480490920718367]
LPCNet is an efficient vocoder that combines linear prediction and deep neural network modules to keep the computational complexity low. We present two techniques to further reduce it's complexity, aiming for a low-cost LPCNet vocoder-based neural Text-to-Speech (TTS) System.
arXiv Detail & Related papers (2020-08-11T08:15:45Z)
PolyDL: Polyhedral Optimizations for Creation of High Performance DL primitives [55.79741270235602]
We present compiler algorithms to automatically generate high performance implementations of Deep Learning primitives. We develop novel data reuse analysis algorithms using the polyhedral model. We also show that such a hybrid compiler plus a minimal library-use approach results in state-of-the-art performance.
arXiv Detail & Related papers (2020-06-02T06:44:09Z)
Einsum Networks: Fast and Scalable Learning of Tractable Probabilistic Circuits [99.59941892183454]
We propose Einsum Networks (EiNets), a novel implementation design for PCs. At their core, EiNets combine a large number of arithmetic operations in a single monolithic einsum-operation. We show that the implementation of Expectation-Maximization (EM) can be simplified for PCs, by leveraging automatic differentiation.
arXiv Detail & Related papers (2020-04-13T23:09:15Z)

This list is automatically generated from the titles and abstracts of the papers in this site.