Related papers: MelGlow: Efficient Waveform Generative Network Based on Location-Variable Convolution

MelGlow: Efficient Waveform Generative Network Based on Location-Variable Convolution

URL: http://arxiv.org/abs/2012.01684v1
Date: Thu, 3 Dec 2020 03:43:22 GMT
Title: MelGlow: Efficient Waveform Generative Network Based on Location-Variable Convolution
Authors: Zhen Zeng, Jianzong Wang, Ning Cheng, Jing Xiao
Abstract summary: An efficient network, named location-variable convolution, is proposed to model the dependencies of waveforms. Experiments on the LJSpeech dataset show that MelGlow achieves better performance than WaveGlow at small model sizes.
Score: 28.073277485158737
License: http://creativecommons.org/licenses/by/4.0/
Abstract: Recent neural vocoders usually use a WaveNet-like network to capture the long-term dependencies of the waveform, but a large number of parameters are required to obtain good modeling capabilities. In this paper, an efficient network, named location-variable convolution, is proposed to model the dependencies of waveforms. Different from the use of unified convolution kernels in WaveNet to capture the dependencies of arbitrary waveforms, location-variable convolutions utilizes a kernel predictor to generate multiple sets of convolution kernels based on the mel-spectrum, where each set of convolution kernels is used to perform convolution operations on the associated waveform intervals. Combining WaveGlow and location-variable convolutions, an efficient vocoder, named MelGlow, is designed. Experiments on the LJSpeech dataset show that MelGlow achieves better performance than WaveGlow at small model sizes, which verifies the effectiveness and potential optimization space of location-variable convolutions.

Related papers

PeriodWave: Multi-Period Flow Matching for High-Fidelity Waveform Generation [37.35829410807451]
We propose PeriodWave, a novel universal waveform generation model. We introduce a period-aware flow matching estimator that can capture the periodic features of the waveform signal. We also propose a single period-conditional universal estimator that can feed-forward parallel by period-wise batch inference.
arXiv Detail & Related papers (2024-08-14T13:36:17Z)
WiNet: Wavelet-based Incremental Learning for Efficient Medical Image Registration [68.25711405944239]
Deep image registration has demonstrated exceptional accuracy and fast inference. Recent advances have adopted either multiple cascades or pyramid architectures to estimate dense deformation fields in a coarse-to-fine manner. We introduce a model-driven WiNet that incrementally estimates scale-wise wavelet coefficients for the displacement/velocity field across various scales.
arXiv Detail & Related papers (2024-07-18T11:51:01Z)
Advancing Graph Convolutional Networks via General Spectral Wavelets [41.41593198072709]
We present a novel wavelet-based graph convolution network, namely WaveGC, which integrates multi-resolution spectral bases and a matrix-valued filter kernel. Theoretically, we establish that WaveGC can effectively capture and decouple short-range and long-range information, providing superior filtering flexibility.
arXiv Detail & Related papers (2024-05-22T16:32:27Z)
Wav-KAN: Wavelet Kolmogorov-Arnold Networks [3.38220960870904]
Wav-KAN is an innovative neural network architecture that leverages the Wavelet Kolmogorov-Arnold Networks (Wav-KAN) framework to enhance interpretability and performance. Our results highlight the potential of Wav-KAN as a powerful tool for developing interpretable and high-performance neural networks.
arXiv Detail & Related papers (2024-05-21T14:36:16Z)
Dynamic Frame Interpolation in Wavelet Domain [57.25341639095404]
Video frame is an important low-level computation vision task, which can increase frame rate for more fluent visual experience. Existing methods have achieved great success by employing advanced motion models and synthesis networks. WaveletVFI can reduce computation up to 40% while maintaining similar accuracy, making it perform more efficiently against other state-of-the-arts.
arXiv Detail & Related papers (2023-09-07T06:41:15Z)
Waveflow: boundary-conditioned normalizing flows applied to fermionic wavefunctions [3.7135179920970534]
We introduce Waveflow, a framework for learning fermionic wavefunctions using boundary-conditioned normalizing flows. We show that Waveflow can effectively resolve topological mismatches and faithfully learn the ground-state wavefunction.
arXiv Detail & Related papers (2022-11-27T14:32:09Z)
NeuralDPS: Neural Deterministic Plus Stochastic Model with Multiband Excitation for Noise-Controllable Waveform Generation [67.96138567288197]
We propose a novel neural vocoder named NeuralDPS which can retain high speech quality and acquire high synthesis efficiency and noise controllability. It generates waveforms at least 280 times faster than the WaveNet vocoder. It is also 28% faster than WaveGAN's synthesis efficiency on a single core.
arXiv Detail & Related papers (2022-03-05T08:15:29Z)
Dynamic Convolution for 3D Point Cloud Instance Segmentation [146.7971476424351]
We propose an approach to instance segmentation from 3D point clouds based on dynamic convolution. We gather homogeneous points that have identical semantic categories and close votes for the geometric centroids. The proposed approach is proposal-free, and instead exploits a convolution process that adapts to the spatial and semantic characteristics of each instance.
arXiv Detail & Related papers (2021-07-18T09:05:16Z)
DyCo3D: Robust Instance Segmentation of 3D Point Clouds through Dynamic Convolution [136.7261709896713]
We propose a data-driven approach that generates the appropriate convolution kernels to apply in response to the nature of the instances. The proposed method achieves promising results on both ScanetNetV2 and S3DIS. It also improves inference speed by more than 25% over the current state-of-the-art.
arXiv Detail & Related papers (2020-11-26T14:56:57Z)
Real Time Speech Enhancement in the Waveform Domain [99.02180506016721]
We present a causal speech enhancement model working on the raw waveform that runs in real-time on a laptop CPU. The proposed model is based on an encoder-decoder architecture with skip-connections. It is capable of removing various kinds of background noise including stationary and non-stationary noises.
arXiv Detail & Related papers (2020-06-23T09:19:13Z)
Wavelet Networks: Scale-Translation Equivariant Learning From Raw Time-Series [31.73386289965465]
We find that scale-translation equivariant mappings share strong resemblance with the wavelet transform. Inspired by this resemblance, we term our networks Wavelet Networks, and show that they perform nested non-linear wavelet-like time-frequency transforms.
arXiv Detail & Related papers (2020-06-09T13:50:34Z)

This list is automatically generated from the titles and abstracts of the papers in this site.