WaveNODE: A Continuous Normalizing Flow for Speech Synthesis
- URL: http://arxiv.org/abs/2006.04598v4
- Date: Thu, 2 Jul 2020 23:12:56 GMT
- Title: WaveNODE: A Continuous Normalizing Flow for Speech Synthesis
- Authors: Hyeongju Kim, Hyeonseung Lee, Woo Hyun Kang, Sung Jun Cheon, Byoung
Jin Choi, Nam Soo Kim
- Abstract summary: We propose a novel generative model called WaveNODE which exploits a continuous normalizing flow for speech synthesis.
WaveNODE places no constraint on the function used for flow operation, thus allowing the usage of more flexible and complex functions.
We experimentally show that WaveNODE achieves comparable performance with fewer parameters compared to the conventional flow-based vocoders.
- Score: 15.051929807285847
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: In recent years, various flow-based generative models have been proposed to
generate high-fidelity waveforms in real-time. However, these models require
either a well-trained teacher network or a number of flow steps making them
memory-inefficient. In this paper, we propose a novel generative model called
WaveNODE which exploits a continuous normalizing flow for speech synthesis.
Unlike the conventional models, WaveNODE places no constraint on the function
used for flow operation, thus allowing the usage of more flexible and complex
functions. Moreover, WaveNODE can be optimized to maximize the likelihood
without requiring any teacher network or auxiliary loss terms. We
experimentally show that WaveNODE achieves comparable performance with fewer
parameters compared to the conventional flow-based vocoders.
Related papers
- Trajectory Flow Matching with Applications to Clinical Time Series Modeling [77.58277281319253]
Trajectory Flow Matching (TFM) trains a Neural SDE in a simulation-free manner, bypassing backpropagation through the dynamics.
We demonstrate improved performance on three clinical time series datasets in terms of absolute performance and uncertainty prediction.
arXiv Detail & Related papers (2024-10-28T15:54:50Z) - PeriodWave: Multi-Period Flow Matching for High-Fidelity Waveform Generation [37.35829410807451]
We propose PeriodWave, a novel universal waveform generation model.
We introduce a period-aware flow matching estimator that can capture the periodic features of the waveform signal.
We also propose a single period-conditional universal estimator that can feed-forward parallel by period-wise batch inference.
arXiv Detail & Related papers (2024-08-14T13:36:17Z) - Guided Flows for Generative Modeling and Decision Making [55.42634941614435]
We show that Guided Flows significantly improves the sample quality in conditional image generation and zero-shot text synthesis-to-speech.
Notably, we are first to apply flow models for plan generation in the offline reinforcement learning setting ax speedup in compared to diffusion models.
arXiv Detail & Related papers (2023-11-22T15:07:59Z) - Variational waveguide QED simulators [58.720142291102135]
Waveguide QED simulators are made by quantum emitters interacting with one-dimensional photonic band-gap materials.
Here, we demonstrate how these interactions can be a resource to develop more efficient variational quantum algorithms.
arXiv Detail & Related papers (2023-02-03T18:55:08Z) - Incremental Spatial and Spectral Learning of Neural Operators for
Solving Large-Scale PDEs [86.35471039808023]
We introduce the Incremental Fourier Neural Operator (iFNO), which progressively increases the number of frequency modes used by the model.
We show that iFNO reduces total training time while maintaining or improving generalization performance across various datasets.
Our method demonstrates a 10% lower testing error, using 20% fewer frequency modes compared to the existing Fourier Neural Operator, while also achieving a 30% faster training.
arXiv Detail & Related papers (2022-11-28T09:57:15Z) - Waveflow: boundary-conditioned normalizing flows applied to fermionic wavefunctions [3.7135179920970534]
We introduce Waveflow, a framework for learning fermionic wavefunctions using boundary-conditioned normalizing flows.
We show that Waveflow can effectively resolve topological mismatches and faithfully learn the ground-state wavefunction.
arXiv Detail & Related papers (2022-11-27T14:32:09Z) - Solving Seismic Wave Equations on Variable Velocity Models with Fourier
Neural Operator [3.2307366446033945]
We propose a new framework paralleled Fourier neural operator (PFNO) for efficiently training the FNO-based solver.
Numerical experiments demonstrate the high accuracy of both FNO and PFNO with complicated velocity models.
PFNO admits higher computational efficiency on large-scale testing datasets, compared with the traditional finite-difference method.
arXiv Detail & Related papers (2022-09-25T22:25:57Z) - Generative Modeling for Low Dimensional Speech Attributes with Neural
Spline Flows [22.78165635389179]
Pitch information is not only low-dimensional, but also discontinuous, making it particularly difficult to model in a generative setting.
We find this problem to be very well suited for Neural Spline flows, which is a highly expressive alternative to the more common affinecoupling mechanism in Normalizing Flows.
arXiv Detail & Related papers (2022-03-03T15:58:08Z) - Wavelet Flow: Fast Training of High Resolution Normalizing Flows [27.661467862732792]
This paper introduces Wavelet Flow, a multi-scale, normalizing flow architecture based on wavelets.
A major advantage of Wavelet Flow is the ability to construct generative models for high resolution data that are impractical with previous models.
arXiv Detail & Related papers (2020-10-26T18:13:43Z) - Real Time Speech Enhancement in the Waveform Domain [99.02180506016721]
We present a causal speech enhancement model working on the raw waveform that runs in real-time on a laptop CPU.
The proposed model is based on an encoder-decoder architecture with skip-connections.
It is capable of removing various kinds of background noise including stationary and non-stationary noises.
arXiv Detail & Related papers (2020-06-23T09:19:13Z) - STEER: Simple Temporal Regularization For Neural ODEs [80.80350769936383]
We propose a new regularization technique: randomly sampling the end time of the ODE during training.
The proposed regularization is simple to implement, has negligible overhead and is effective across a wide variety of tasks.
We show through experiments on normalizing flows, time series models and image recognition that the proposed regularization can significantly decrease training time and even improve performance over baseline models.
arXiv Detail & Related papers (2020-06-18T17:44:50Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.