MelGlow: Efficient Waveform Generative Network Based on
Location-Variable Convolution
- URL: http://arxiv.org/abs/2012.01684v1
- Date: Thu, 3 Dec 2020 03:43:22 GMT
- Title: MelGlow: Efficient Waveform Generative Network Based on
Location-Variable Convolution
- Authors: Zhen Zeng, Jianzong Wang, Ning Cheng, Jing Xiao
- Abstract summary: An efficient network, named location-variable convolution, is proposed to model the dependencies of waveforms.
Experiments on the LJSpeech dataset show that MelGlow achieves better performance than WaveGlow at small model sizes.
- Score: 28.073277485158737
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Recent neural vocoders usually use a WaveNet-like network to capture the
long-term dependencies of the waveform, but a large number of parameters are
required to obtain good modeling capabilities. In this paper, an efficient
network, named location-variable convolution, is proposed to model the
dependencies of waveforms. Different from the use of unified convolution
kernels in WaveNet to capture the dependencies of arbitrary waveforms,
location-variable convolutions utilizes a kernel predictor to generate multiple
sets of convolution kernels based on the mel-spectrum, where each set of
convolution kernels is used to perform convolution operations on the associated
waveform intervals. Combining WaveGlow and location-variable convolutions, an
efficient vocoder, named MelGlow, is designed. Experiments on the LJSpeech
dataset show that MelGlow achieves better performance than WaveGlow at small
model sizes, which verifies the effectiveness and potential optimization space
of location-variable convolutions.
Related papers
- WiNet: Wavelet-based Incremental Learning for Efficient Medical Image Registration [68.25711405944239]
Deep image registration has demonstrated exceptional accuracy and fast inference.
Recent advances have adopted either multiple cascades or pyramid architectures to estimate dense deformation fields in a coarse-to-fine manner.
We introduce a model-driven WiNet that incrementally estimates scale-wise wavelet coefficients for the displacement/velocity field across various scales.
arXiv Detail & Related papers (2024-07-18T11:51:01Z) - Advancing Graph Convolutional Networks via General Spectral Wavelets [41.41593198072709]
We present a novel wavelet-based graph convolution network, namely WaveGC, which integrates multi-resolution spectral bases and a matrix-valued filter kernel.
Theoretically, we establish that WaveGC can effectively capture and decouple short-range and long-range information, providing superior filtering flexibility.
arXiv Detail & Related papers (2024-05-22T16:32:27Z) - Wav-KAN: Wavelet Kolmogorov-Arnold Networks [3.38220960870904]
Wav-KAN is an innovative neural network architecture that leverages the Wavelet Kolmogorov-Arnold Networks (Wav-KAN) framework to enhance interpretability and performance.
Our results highlight the potential of Wav-KAN as a powerful tool for developing interpretable and high-performance neural networks.
arXiv Detail & Related papers (2024-05-21T14:36:16Z) - Dynamic Frame Interpolation in Wavelet Domain [57.25341639095404]
Video frame is an important low-level computation vision task, which can increase frame rate for more fluent visual experience.
Existing methods have achieved great success by employing advanced motion models and synthesis networks.
WaveletVFI can reduce computation up to 40% while maintaining similar accuracy, making it perform more efficiently against other state-of-the-arts.
arXiv Detail & Related papers (2023-09-07T06:41:15Z) - Waveflow: boundary-conditioned normalizing flows applied to fermionic wavefunctions [3.7135179920970534]
We introduce Waveflow, a framework for learning fermionic wavefunctions using boundary-conditioned normalizing flows.
We show that Waveflow can effectively resolve topological mismatches and faithfully learn the ground-state wavefunction.
arXiv Detail & Related papers (2022-11-27T14:32:09Z) - NeuralDPS: Neural Deterministic Plus Stochastic Model with Multiband
Excitation for Noise-Controllable Waveform Generation [67.96138567288197]
We propose a novel neural vocoder named NeuralDPS which can retain high speech quality and acquire high synthesis efficiency and noise controllability.
It generates waveforms at least 280 times faster than the WaveNet vocoder.
It is also 28% faster than WaveGAN's synthesis efficiency on a single core.
arXiv Detail & Related papers (2022-03-05T08:15:29Z) - Dynamic Convolution for 3D Point Cloud Instance Segmentation [146.7971476424351]
We propose an approach to instance segmentation from 3D point clouds based on dynamic convolution.
We gather homogeneous points that have identical semantic categories and close votes for the geometric centroids.
The proposed approach is proposal-free, and instead exploits a convolution process that adapts to the spatial and semantic characteristics of each instance.
arXiv Detail & Related papers (2021-07-18T09:05:16Z) - DyCo3D: Robust Instance Segmentation of 3D Point Clouds through Dynamic
Convolution [136.7261709896713]
We propose a data-driven approach that generates the appropriate convolution kernels to apply in response to the nature of the instances.
The proposed method achieves promising results on both ScanetNetV2 and S3DIS.
It also improves inference speed by more than 25% over the current state-of-the-art.
arXiv Detail & Related papers (2020-11-26T14:56:57Z) - Real Time Speech Enhancement in the Waveform Domain [99.02180506016721]
We present a causal speech enhancement model working on the raw waveform that runs in real-time on a laptop CPU.
The proposed model is based on an encoder-decoder architecture with skip-connections.
It is capable of removing various kinds of background noise including stationary and non-stationary noises.
arXiv Detail & Related papers (2020-06-23T09:19:13Z) - Wavelet Networks: Scale-Translation Equivariant Learning From Raw
Time-Series [31.73386289965465]
We find that scale-translation equivariant mappings share strong resemblance with the wavelet transform.
Inspired by this resemblance, we term our networks Wavelet Networks, and show that they perform nested non-linear wavelet-like time-frequency transforms.
arXiv Detail & Related papers (2020-06-09T13:50:34Z) - WaveNODE: A Continuous Normalizing Flow for Speech Synthesis [15.051929807285847]
We propose a novel generative model called WaveNODE which exploits a continuous normalizing flow for speech synthesis.
WaveNODE places no constraint on the function used for flow operation, thus allowing the usage of more flexible and complex functions.
We experimentally show that WaveNODE achieves comparable performance with fewer parameters compared to the conventional flow-based vocoders.
arXiv Detail & Related papers (2020-06-08T13:49:36Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.