Stage-Wise and Prior-Aware Neural Speech Phase Prediction
- URL: http://arxiv.org/abs/2410.04990v1
- Date: Mon, 7 Oct 2024 12:45:20 GMT
- Title: Stage-Wise and Prior-Aware Neural Speech Phase Prediction
- Authors: Fei Liu, Yang Ai, Hui-Peng Du, Ye-Xin Lu, Rui-Chen Zheng, Zhen-Hua Ling,
- Abstract summary: This paper proposes a novel Stage-wise and Prior-aware Neural Speech Phase Prediction (SP-NSPP) model.
In the initial prior-construction stage, we preliminarily predict a rough prior phase spectrum from the amplitude spectrum.
The subsequent refinement stage transforms the amplitude spectrum into a refined high-quality phase spectrum conditioned on the prior phase.
- Score: 28.422370098313788
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: This paper proposes a novel Stage-wise and Prior-aware Neural Speech Phase Prediction (SP-NSPP) model, which predicts the phase spectrum from input amplitude spectrum by two-stage neural networks. In the initial prior-construction stage, we preliminarily predict a rough prior phase spectrum from the amplitude spectrum. The subsequent refinement stage transforms the amplitude spectrum into a refined high-quality phase spectrum conditioned on the prior phase. Networks in both stages use ConvNeXt v2 blocks as the backbone and adopt adversarial training by innovatively introducing a phase spectrum discriminator (PSD). To further improve the continuity of the refined phase, we also incorporate a time-frequency integrated difference (TFID) loss in the refinement stage. Experimental results confirm that, compared to neural network-based no-prior phase prediction methods, the proposed SP-NSPP achieves higher phase prediction accuracy, thanks to introducing the coarse phase priors and diverse training criteria. Compared to iterative phase estimation algorithms, our proposed SP-NSPP does not require multiple rounds of staged iterations, resulting in higher generation efficiency.
Related papers
- Diffusion Forcing: Next-token Prediction Meets Full-Sequence Diffusion [61.03681839276652]
Diffusion Forcing is a new training paradigm where a diffusion model is trained to denoise a set of tokens with independent per-token noise levels.
We apply Diffusion Forcing to sequence generative modeling by training a causal next-token prediction model to generate one or several future tokens.
arXiv Detail & Related papers (2024-07-01T15:43:25Z) - Large Stepsize Gradient Descent for Non-Homogeneous Two-Layer Networks: Margin Improvement and Fast Optimization [41.20978920228298]
We show that the second phase begins once the empirical risk falls below a certain threshold, dependent on the stepsize.
We also show that the normalized margin grows nearly monotonically in the second phase, demonstrating an implicit bias of GD in training non-homogeneous predictors.
Our analysis applies to networks of any width, beyond the well-known neural tangent kernel and mean-field regimes.
arXiv Detail & Related papers (2024-06-12T21:33:22Z) - PhasePerturbation: Speech Data Augmentation via Phase Perturbation for
Automatic Speech Recognition [22.322528334591134]
We propose a novel speech data augmentation method called PhasePerturbation.
PhasePerturbation operates dynamically on the phase spectrum of speech.
arXiv Detail & Related papers (2023-12-13T23:46:26Z) - SurgPLAN: Surgical Phase Localization Network for Phase Recognition [14.857715124466594]
We propose a Surgical Phase LocAlization Network, named SurgPLAN, to facilitate a more accurate and stable surgical phase recognition.
We first devise a Pyramid SlowFast (PSF) architecture to serve as the visual backbone to capture multi-scale spatial and temporal features by two branches with different frame sampling rates.
arXiv Detail & Related papers (2023-11-16T15:39:01Z) - Discriminating the Phase of a Coherent Tone with a Flux-Switchable
Superconducting Circuit [50.591267188664666]
We propose a new phase detection technique based on a flux-switchable superconducting circuit.
The Josephson digital phase detector (JDPD) is capable of discriminating between two phase values of a coherent input tone.
arXiv Detail & Related papers (2023-06-20T08:09:37Z) - Exact Phase Transitions in Deep Learning [5.33024001730262]
We prove that the competition between prediction error and model complexity in the training loss leads to the second-order phase transition for nets with one hidden layer and the first-order phase transition for nets with more than one hidden layer.
The proposed theory is directly relevant to the optimization of neural networks and points to an origin of the posterior collapse problem in Bayesian deep learning.
arXiv Detail & Related papers (2022-05-25T06:00:34Z) - SpecGrad: Diffusion Probabilistic Model based Neural Vocoder with
Adaptive Noise Spectral Shaping [51.698273019061645]
SpecGrad adapts the diffusion noise so that its time-varying spectral envelope becomes close to the conditioning log-mel spectrogram.
It is processed in the time-frequency domain to keep the computational cost almost the same as the conventional DDPM-based neural vocoders.
arXiv Detail & Related papers (2022-03-31T02:08:27Z) - Proximal Policy Optimization-based Transmit Beamforming and Phase-shift
Design in an IRS-aided ISAC System for the THz Band [90.45915557253385]
IRS-aided integrated sensing and communications (ISAC) system operating in the terahertz (THz) band is proposed to maximize the system capacity.
Transmit beamforming and phase-shift design are transformed into a universal optimization problem with ergodic constraints.
arXiv Detail & Related papers (2022-03-21T09:15:18Z) - Dual-Frequency Quantum Phase Estimation Mitigates the Spectral Leakage
of Quantum Algorithms [76.15799379604898]
Quantum phase estimation suffers from spectral leakage when the reciprocal of the record length is not an integer multiple of the unknown phase.
We propose a dual-frequency estimator, which approaches the Cramer-Rao bound, when multiple samples are available.
arXiv Detail & Related papers (2022-01-23T17:20:34Z) - Squeezing as a resource to counteract phase diffusion in optical phase
estimation [0.0]
We analyze situations in which the noise occurs before encoding phase information.
We show that squeezing the probe after the noise greatly enhances the sensitivity of the estimation scheme.
arXiv Detail & Related papers (2020-08-07T13:08:23Z) - Simultaneous Denoising and Dereverberation Using Deep Embedding Features [64.58693911070228]
We propose a joint training method for simultaneous speech denoising and dereverberation using deep embedding features.
At the denoising stage, the DC network is leveraged to extract noise-free deep embedding features.
At the dereverberation stage, instead of using the unsupervised K-means clustering algorithm, another neural network is utilized to estimate the anechoic speech.
arXiv Detail & Related papers (2020-04-06T06:34:01Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.