Causal-Anticausal Decomposition of Speech using Complex Cepstrum for
Glottal Source Estimation
- URL: http://arxiv.org/abs/1912.12843v1
- Date: Mon, 30 Dec 2019 08:12:03 GMT
- Title: Causal-Anticausal Decomposition of Speech using Complex Cepstrum for
Glottal Source Estimation
- Authors: Thomas Drugman, Baris Bozkurt, Thierry Dutoit
- Abstract summary: We show that complex cepstrum causal-anticausal decomposition can be effectively used for glottal flow estimation.
The proposed method has the potential to be used for voice quality analysis.
- Score: 11.481208551940998
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Complex cepstrum is known in the literature for linearly separating causal
and anticausal components. Relying on advances achieved by the Zeros of the
Z-Transform (ZZT) technique, we here investigate the possibility of using
complex cepstrum for glottal flow estimation on a large-scale database. Via a
systematic study of the windowing effects on the deconvolution quality, we show
that the complex cepstrum causal-anticausal decomposition can be effectively
used for glottal flow estimation when specific windowing criteria are met. It
is also shown that this complex cepstral decomposition gives similar glottal
estimates as obtained with the ZZT method. However, as complex cepstrum uses
FFT operations instead of requiring the factoring of high-degree polynomials,
the method benefits from a much higher speed. Finally in our tests on a large
corpus of real expressive speech, we show that the proposed method has the
potential to be used for voice quality analysis.
Related papers
- LoCA: Location-Aware Cosine Adaptation for Parameter-Efficient Fine-Tuning [47.77830360814755]
Location-aware Cosine Adaptation (LoCA) is a novel frequency-domain parameter-efficient fine-tuning method based on Discrete inverse Cosine Transform (iDCT)
Our analysis reveals that frequency-domain approximation with carefully selected frequency components can surpass the expressivity of traditional low-rank-based methods.
Experiments on diverse language and vision fine-tuning tasks demonstrate that LoCA offers enhanced parameter efficiency while maintains computational feasibility comparable to low-rank-based methods.
arXiv Detail & Related papers (2025-02-05T04:14:34Z) - RhythmFormer: Extracting Patterned rPPG Signals based on Periodic Sparse Attention [18.412642801957197]
RRhythm is a non-contact method for detecting physiological signals based on physiological videos.
This paper proposes a periodic attention mechanism based on temporal attention sparsity induced by periodicity.
It achieves state-of-the-art performance in both intra-dataset and cross-dataset evaluations.
arXiv Detail & Related papers (2024-02-20T07:56:02Z) - Exploiting Causal Graph Priors with Posterior Sampling for Reinforcement Learning [86.22660674919746]
Posterior sampling allows exploitation of prior knowledge on the environment's transition dynamics.
We propose a novel posterior sampling approach in which the prior is given as a causal graph over the environment's variables.
arXiv Detail & Related papers (2023-10-11T14:16:04Z) - Analysis and Detection of Pathological Voice using Glottal Source
Features [18.80191660913831]
Glottal source features are extracted using glottal flows estimated with the quasi-closed phase (QCP) glottal inverse filtering method.
We derive mel-frequency cepstral coefficients (MFCCs) from the glottal source waveforms computed by QCP and ZFF.
Analysis of features revealed that the glottal source contains information that discriminates normal and pathological voice.
arXiv Detail & Related papers (2023-09-25T12:14:25Z) - Decomposed Diffusion Sampler for Accelerating Large-Scale Inverse
Problems [64.29491112653905]
We propose a novel and efficient diffusion sampling strategy that synergistically combines the diffusion sampling and Krylov subspace methods.
Specifically, we prove that if tangent space at a denoised sample by Tweedie's formula forms a Krylov subspace, then the CG with the denoised data ensures the data consistency update to remain in the tangent space.
Our proposed method achieves more than 80 times faster inference time than the previous state-of-the-art method.
arXiv Detail & Related papers (2023-03-10T07:42:49Z) - Discretization and Re-synthesis: an alternative method to solve the
Cocktail Party Problem [65.25725367771075]
This study demonstrates, for the first time, that the synthesis-based approach can also perform well on this problem.
Specifically, we propose a novel speech separation/enhancement model based on the recognition of discrete symbols.
By utilizing the synthesis model with the input of discrete symbols, after the prediction of discrete symbol sequence, each target speech could be re-synthesized.
arXiv Detail & Related papers (2021-12-17T08:35:40Z) - Regularization by Denoising Sub-sampled Newton Method for Spectral CT
Multi-Material Decomposition [78.37855832568569]
We propose to solve a model-based maximum-a-posterior problem to reconstruct multi-materials images with application to spectral CT.
In particular, we propose to solve a regularized optimization problem based on a plug-in image-denoising function.
We show numerical and experimental results for spectral CT materials decomposition.
arXiv Detail & Related papers (2021-03-25T15:20:10Z) - Chirp Complex Cepstrum-based Decomposition for Asynchronous Glottal
Analysis [13.563526970105988]
This paper proposes an extension of the complex cepstrum-based decomposition by incorporating a chirp analysis.
The resulting method is shown to give a reliable estimation of the glottal flow wherever the window is located.
arXiv Detail & Related papers (2020-05-10T17:33:48Z) - Simultaneous Denoising and Dereverberation Using Deep Embedding Features [64.58693911070228]
We propose a joint training method for simultaneous speech denoising and dereverberation using deep embedding features.
At the denoising stage, the DC network is leveraged to extract noise-free deep embedding features.
At the dereverberation stage, instead of using the unsupervised K-means clustering algorithm, another neural network is utilized to estimate the anechoic speech.
arXiv Detail & Related papers (2020-04-06T06:34:01Z) - Glottal Source Processing: from Analysis to Applications [35.80742217666323]
glottal analysis from speech recordings requires specific and more complex processing operations.
This review gives a general overview of techniques which have been designed for glottal source processing.
arXiv Detail & Related papers (2019-12-29T08:13:58Z) - Complex Cepstrum-based Decomposition of Speech for Glottal Source
Estimation [11.481208551940998]
We show that complex cepstrum can be effectively used for glottal flow estimation.
Based on exactly the same principles presented for ZZT decomposition, windowing should be applied.
arXiv Detail & Related papers (2019-12-29T07:58:18Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.