Causal-Anticausal Decomposition of Speech using Complex Cepstrum for
Glottal Source Estimation
- URL: http://arxiv.org/abs/1912.12843v1
- Date: Mon, 30 Dec 2019 08:12:03 GMT
- Title: Causal-Anticausal Decomposition of Speech using Complex Cepstrum for
Glottal Source Estimation
- Authors: Thomas Drugman, Baris Bozkurt, Thierry Dutoit
- Abstract summary: We show that complex cepstrum causal-anticausal decomposition can be effectively used for glottal flow estimation.
The proposed method has the potential to be used for voice quality analysis.
- Score: 11.481208551940998
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Complex cepstrum is known in the literature for linearly separating causal
and anticausal components. Relying on advances achieved by the Zeros of the
Z-Transform (ZZT) technique, we here investigate the possibility of using
complex cepstrum for glottal flow estimation on a large-scale database. Via a
systematic study of the windowing effects on the deconvolution quality, we show
that the complex cepstrum causal-anticausal decomposition can be effectively
used for glottal flow estimation when specific windowing criteria are met. It
is also shown that this complex cepstral decomposition gives similar glottal
estimates as obtained with the ZZT method. However, as complex cepstrum uses
FFT operations instead of requiring the factoring of high-degree polynomials,
the method benefits from a much higher speed. Finally in our tests on a large
corpus of real expressive speech, we show that the proposed method has the
potential to be used for voice quality analysis.
Related papers
- Hyperbolic Fine-tuning for Large Language Models [56.54715487997674]
This study investigates the non-Euclidean characteristics of large language models (LLMs)
We show that token embeddings exhibit a high degree of hyperbolicity, indicating a latent tree-like structure in the embedding space.
We introduce a new method called hyperbolic low-rank efficient fine-tuning, HypLoRA, that performs low-rank adaptation directly on the hyperbolic manifold.
arXiv Detail & Related papers (2024-10-05T02:58:25Z) - RhythmFormer: Extracting rPPG Signals Based on Hierarchical Temporal
Periodic Transformer [17.751885452773983]
We propose a fully end-to-end transformer-based method for extracting r signals by explicitly leveraging the quasi-periodic nature of r periodicity.
A fusion stem is proposed to guide self-attention to r features effectively, and it can be easily transferred to existing methods to enhance their performance significantly.
arXiv Detail & Related papers (2024-02-20T07:56:02Z) - Exploiting Causal Graph Priors with Posterior Sampling for Reinforcement Learning [86.22660674919746]
Posterior sampling allows exploitation of prior knowledge on the environment's transition dynamics.
We propose a novel posterior sampling approach in which the prior is given as a causal graph over the environment's variables.
arXiv Detail & Related papers (2023-10-11T14:16:04Z) - Analysis and Detection of Pathological Voice using Glottal Source
Features [18.80191660913831]
Glottal source features are extracted using glottal flows estimated with the quasi-closed phase (QCP) glottal inverse filtering method.
We derive mel-frequency cepstral coefficients (MFCCs) from the glottal source waveforms computed by QCP and ZFF.
Analysis of features revealed that the glottal source contains information that discriminates normal and pathological voice.
arXiv Detail & Related papers (2023-09-25T12:14:25Z) - Decomposed Diffusion Sampler for Accelerating Large-Scale Inverse
Problems [64.29491112653905]
We propose a novel and efficient diffusion sampling strategy that synergistically combines the diffusion sampling and Krylov subspace methods.
Specifically, we prove that if tangent space at a denoised sample by Tweedie's formula forms a Krylov subspace, then the CG with the denoised data ensures the data consistency update to remain in the tangent space.
Our proposed method achieves more than 80 times faster inference time than the previous state-of-the-art method.
arXiv Detail & Related papers (2023-03-10T07:42:49Z) - Discretization and Re-synthesis: an alternative method to solve the
Cocktail Party Problem [65.25725367771075]
This study demonstrates, for the first time, that the synthesis-based approach can also perform well on this problem.
Specifically, we propose a novel speech separation/enhancement model based on the recognition of discrete symbols.
By utilizing the synthesis model with the input of discrete symbols, after the prediction of discrete symbol sequence, each target speech could be re-synthesized.
arXiv Detail & Related papers (2021-12-17T08:35:40Z) - Regularization by Denoising Sub-sampled Newton Method for Spectral CT
Multi-Material Decomposition [78.37855832568569]
We propose to solve a model-based maximum-a-posterior problem to reconstruct multi-materials images with application to spectral CT.
In particular, we propose to solve a regularized optimization problem based on a plug-in image-denoising function.
We show numerical and experimental results for spectral CT materials decomposition.
arXiv Detail & Related papers (2021-03-25T15:20:10Z) - Chirp Complex Cepstrum-based Decomposition for Asynchronous Glottal
Analysis [13.563526970105988]
This paper proposes an extension of the complex cepstrum-based decomposition by incorporating a chirp analysis.
The resulting method is shown to give a reliable estimation of the glottal flow wherever the window is located.
arXiv Detail & Related papers (2020-05-10T17:33:48Z) - Simultaneous Denoising and Dereverberation Using Deep Embedding Features [64.58693911070228]
We propose a joint training method for simultaneous speech denoising and dereverberation using deep embedding features.
At the denoising stage, the DC network is leveraged to extract noise-free deep embedding features.
At the dereverberation stage, instead of using the unsupervised K-means clustering algorithm, another neural network is utilized to estimate the anechoic speech.
arXiv Detail & Related papers (2020-04-06T06:34:01Z) - Glottal Source Processing: from Analysis to Applications [35.80742217666323]
glottal analysis from speech recordings requires specific and more complex processing operations.
This review gives a general overview of techniques which have been designed for glottal source processing.
arXiv Detail & Related papers (2019-12-29T08:13:58Z) - Complex Cepstrum-based Decomposition of Speech for Glottal Source
Estimation [11.481208551940998]
We show that complex cepstrum can be effectively used for glottal flow estimation.
Based on exactly the same principles presented for ZZT decomposition, windowing should be applied.
arXiv Detail & Related papers (2019-12-29T07:58:18Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.