Glottal Source Processing: from Analysis to Applications
- URL: http://arxiv.org/abs/1912.12604v1
- Date: Sun, 29 Dec 2019 08:13:58 GMT
- Title: Glottal Source Processing: from Analysis to Applications
- Authors: Thomas Drugman, Paavo Alku, Abeer Alwan, Bayya Yegnanarayana
- Abstract summary: glottal analysis from speech recordings requires specific and more complex processing operations.
This review gives a general overview of techniques which have been designed for glottal source processing.
- Score: 35.80742217666323
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: The great majority of current voice technology applications relies on
acoustic features characterizing the vocal tract response, such as the widely
used MFCC of LPC parameters. Nonetheless, the airflow passing through the vocal
folds, and called glottal flow, is expected to exhibit a relevant
complementarity. Unfortunately, glottal analysis from speech recordings
requires specific and more complex processing operations, which explains why it
has been generally avoided. This review gives a general overview of techniques
which have been designed for glottal source processing. Starting from
fundamental analysis tools of pitch tracking, glottal closure instant
detection, glottal flow estimation and modelling, this paper then highlights
how these solutions can be properly integrated within various voice technology
applications.
Related papers
- Training-Free Deepfake Voice Recognition by Leveraging Large-Scale Pre-Trained Models [52.04189118767758]
Generalization is a main issue for current audio deepfake detectors.
In this paper we study the potential of large-scale pre-trained models for audio deepfake detection.
arXiv Detail & Related papers (2024-05-03T15:27:11Z) - Voice Signal Processing for Machine Learning. The Case of Speaker Isolation [0.0]
This paper provides a comparative analysis of Fourier and Wavelet transforms that are most commonly used as signal decomposition methods for audio processing tasks.
The level of detail in the exposition is meant to be sufficient for an ML engineer to make informed decisions when choosing, fine-tuning, and evaluating a decomposition method for a specific ML model.
arXiv Detail & Related papers (2024-03-29T14:31:36Z) - Psychoacoustic Challenges Of Speech Enhancement On VoIP Platforms [19.122454483635615]
The study draws upon the Deep Noise Suppression (DNS) 2020 dataset, ensuring a structured examination tailored to various denoising settings and receiver interfaces.
A methodological novelty is introduced via Blinder-Oaxaca decomposition, traditionally an econometric tool, repurposed herein to analyze acoustic-phonetic perturbations within VoIP systems.
In addition to the primary findings, a multitude of metrics are reported, extending the research purview.
arXiv Detail & Related papers (2023-10-11T03:19:22Z) - Analysis and Detection of Pathological Voice using Glottal Source
Features [18.80191660913831]
Glottal source features are extracted using glottal flows estimated with the quasi-closed phase (QCP) glottal inverse filtering method.
We derive mel-frequency cepstral coefficients (MFCCs) from the glottal source waveforms computed by QCP and ZFF.
Analysis of features revealed that the glottal source contains information that discriminates normal and pathological voice.
arXiv Detail & Related papers (2023-09-25T12:14:25Z) - DiffSVC: A Diffusion Probabilistic Model for Singing Voice Conversion [51.83469048737548]
We propose DiffSVC, an SVC system based on denoising diffusion probabilistic model.
A denoising module is trained in DiffSVC, which takes destroyed mel spectrogram and its corresponding step information as input to predict the added Gaussian noise.
Experiments show that DiffSVC can achieve superior conversion performance in terms of naturalness and voice similarity to current state-of-the-art SVC approaches.
arXiv Detail & Related papers (2021-05-28T14:26:40Z) - FragmentVC: Any-to-Any Voice Conversion by End-to-End Extracting and
Fusing Fine-Grained Voice Fragments With Attention [66.77490220410249]
We propose FragmentVC, in which the latent phonetic structure of the utterance from the source speaker is obtained from Wav2Vec 2.0.
FragmentVC is able to extract fine-grained voice fragments from the target speaker utterance(s) and fuse them into the desired utterance.
This approach is trained with reconstruction loss only without any disentanglement considerations between content and speaker information.
arXiv Detail & Related papers (2020-10-27T09:21:03Z) - Any-to-Many Voice Conversion with Location-Relative Sequence-to-Sequence
Modeling [61.351967629600594]
This paper proposes an any-to-many location-relative, sequence-to-sequence (seq2seq), non-parallel voice conversion approach.
In this approach, we combine a bottle-neck feature extractor (BNE) with a seq2seq synthesis module.
Objective and subjective evaluations show that the proposed any-to-many approach has superior voice conversion performance in terms of both naturalness and speaker similarity.
arXiv Detail & Related papers (2020-09-06T13:01:06Z) - Chirp Complex Cepstrum-based Decomposition for Asynchronous Glottal
Analysis [13.563526970105988]
This paper proposes an extension of the complex cepstrum-based decomposition by incorporating a chirp analysis.
The resulting method is shown to give a reliable estimation of the glottal flow wherever the window is located.
arXiv Detail & Related papers (2020-05-10T17:33:48Z) - Temporal-Spatial Neural Filter: Direction Informed End-to-End
Multi-channel Target Speech Separation [66.46123655365113]
Target speech separation refers to extracting the target speaker's speech from mixed signals.
Two main challenges are the complex acoustic environment and the real-time processing requirement.
We propose a temporal-spatial neural filter, which directly estimates the target speech waveform from multi-speaker mixture.
arXiv Detail & Related papers (2020-01-02T11:12:50Z) - Causal-Anticausal Decomposition of Speech using Complex Cepstrum for
Glottal Source Estimation [11.481208551940998]
We show that complex cepstrum causal-anticausal decomposition can be effectively used for glottal flow estimation.
The proposed method has the potential to be used for voice quality analysis.
arXiv Detail & Related papers (2019-12-30T08:12:03Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.