Related papers: Causal-Anticausal Decomposition of Speech using Complex Cepstrum for Glottal Source Estimation

Causal-Anticausal Decomposition of Speech using Complex Cepstrum for Glottal Source Estimation

URL: http://arxiv.org/abs/1912.12843v1
Date: Mon, 30 Dec 2019 08:12:03 GMT
Title: Causal-Anticausal Decomposition of Speech using Complex Cepstrum for Glottal Source Estimation
Authors: Thomas Drugman, Baris Bozkurt, Thierry Dutoit
Abstract summary: We show that complex cepstrum causal-anticausal decomposition can be effectively used for glottal flow estimation. The proposed method has the potential to be used for voice quality analysis.
Score: 11.481208551940998
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Complex cepstrum is known in the literature for linearly separating causal and anticausal components. Relying on advances achieved by the Zeros of the Z-Transform (ZZT) technique, we here investigate the possibility of using complex cepstrum for glottal flow estimation on a large-scale database. Via a systematic study of the windowing effects on the deconvolution quality, we show that the complex cepstrum causal-anticausal decomposition can be effectively used for glottal flow estimation when specific windowing criteria are met. It is also shown that this complex cepstral decomposition gives similar glottal estimates as obtained with the ZZT method. However, as complex cepstrum uses FFT operations instead of requiring the factoring of high-degree polynomials, the method benefits from a much higher speed. Finally in our tests on a large corpus of real expressive speech, we show that the proposed method has the potential to be used for voice quality analysis.

Related papers

LoCA: Location-Aware Cosine Adaptation for Parameter-Efficient Fine-Tuning [47.77830360814755]
Location-aware Cosine Adaptation (LoCA) is a novel frequency-domain parameter-efficient fine-tuning method based on Discrete inverse Cosine Transform (iDCT) Our analysis reveals that frequency-domain decomposition with carefully selected frequency components can surpass the expressivity of traditional low-rank-based methods. Experiments on diverse language and vision fine-tuning tasks demonstrate that LoCA offers enhanced parameter efficiency while maintains computational feasibility comparable to low-rank-based methods.
arXiv Detail & Related papers (2025-02-05T04:14:34Z)
Hyperbolic Fine-tuning for Large Language Models [56.54715487997674]
This study investigates the non-Euclidean characteristics of large language models (LLMs) We show that token embeddings exhibit a high degree of hyperbolicity, indicating a latent tree-like structure in the embedding space. We introduce a new method called hyperbolic low-rank efficient fine-tuning, HypLoRA, that performs low-rank adaptation directly on the hyperbolic manifold.
arXiv Detail & Related papers (2024-10-05T02:58:25Z)
RhythmFormer: Extracting rPPG Signals Based on Hierarchical Temporal Periodic Transformer [17.751885452773983]
We propose a fully end-to-end transformer-based method for extracting r signals by explicitly leveraging the quasi-periodic nature of r periodicity. A fusion stem is proposed to guide self-attention to r features effectively, and it can be easily transferred to existing methods to enhance their performance significantly.
arXiv Detail & Related papers (2024-02-20T07:56:02Z)
Exploiting Causal Graph Priors with Posterior Sampling for Reinforcement Learning [86.22660674919746]
Posterior sampling allows exploitation of prior knowledge on the environment's transition dynamics. We propose a novel posterior sampling approach in which the prior is given as a causal graph over the environment's variables.
arXiv Detail & Related papers (2023-10-11T14:16:04Z)
Analysis and Detection of Pathological Voice using Glottal Source Features [18.80191660913831]
Glottal source features are extracted using glottal flows estimated with the quasi-closed phase (QCP) glottal inverse filtering method. We derive mel-frequency cepstral coefficients (MFCCs) from the glottal source waveforms computed by QCP and ZFF. Analysis of features revealed that the glottal source contains information that discriminates normal and pathological voice.
arXiv Detail & Related papers (2023-09-25T12:14:25Z)
Decomposed Diffusion Sampler for Accelerating Large-Scale Inverse Problems [64.29491112653905]
We propose a novel and efficient diffusion sampling strategy that synergistically combines the diffusion sampling and Krylov subspace methods. Specifically, we prove that if tangent space at a denoised sample by Tweedie's formula forms a Krylov subspace, then the CG with the denoised data ensures the data consistency update to remain in the tangent space. Our proposed method achieves more than 80 times faster inference time than the previous state-of-the-art method.
arXiv Detail & Related papers (2023-03-10T07:42:49Z)
Discretization and Re-synthesis: an alternative method to solve the Cocktail Party Problem [65.25725367771075]
This study demonstrates, for the first time, that the synthesis-based approach can also perform well on this problem. Specifically, we propose a novel speech separation/enhancement model based on the recognition of discrete symbols. By utilizing the synthesis model with the input of discrete symbols, after the prediction of discrete symbol sequence, each target speech could be re-synthesized.
arXiv Detail & Related papers (2021-12-17T08:35:40Z)
Regularization by Denoising Sub-sampled Newton Method for Spectral CT Multi-Material Decomposition [78.37855832568569]
We propose to solve a model-based maximum-a-posterior problem to reconstruct multi-materials images with application to spectral CT. In particular, we propose to solve a regularized optimization problem based on a plug-in image-denoising function. We show numerical and experimental results for spectral CT materials decomposition.
arXiv Detail & Related papers (2021-03-25T15:20:10Z)
Chirp Complex Cepstrum-based Decomposition for Asynchronous Glottal Analysis [13.563526970105988]
This paper proposes an extension of the complex cepstrum-based decomposition by incorporating a chirp analysis. The resulting method is shown to give a reliable estimation of the glottal flow wherever the window is located.
arXiv Detail & Related papers (2020-05-10T17:33:48Z)
Simultaneous Denoising and Dereverberation Using Deep Embedding Features [64.58693911070228]
We propose a joint training method for simultaneous speech denoising and dereverberation using deep embedding features. At the denoising stage, the DC network is leveraged to extract noise-free deep embedding features. At the dereverberation stage, instead of using the unsupervised K-means clustering algorithm, another neural network is utilized to estimate the anechoic speech.
arXiv Detail & Related papers (2020-04-06T06:34:01Z)
Glottal Source Processing: from Analysis to Applications [35.80742217666323]
glottal analysis from speech recordings requires specific and more complex processing operations. This review gives a general overview of techniques which have been designed for glottal source processing.
arXiv Detail & Related papers (2019-12-29T08:13:58Z)
Complex Cepstrum-based Decomposition of Speech for Glottal Source Estimation [11.481208551940998]
We show that complex cepstrum can be effectively used for glottal flow estimation. Based on exactly the same principles presented for ZZT decomposition, windowing should be applied.
arXiv Detail & Related papers (2019-12-29T07:58:18Z)

This list is automatically generated from the titles and abstracts of the papers in this site.