Analysis and Detection of Pathological Voice using Glottal Source
Features
- URL: http://arxiv.org/abs/2309.14080v2
- Date: Tue, 17 Oct 2023 13:36:36 GMT
- Title: Analysis and Detection of Pathological Voice using Glottal Source
Features
- Authors: Sudarsana Reddy Kadiri and Paavo Alku
- Abstract summary: Glottal source features are extracted using glottal flows estimated with the quasi-closed phase (QCP) glottal inverse filtering method.
We derive mel-frequency cepstral coefficients (MFCCs) from the glottal source waveforms computed by QCP and ZFF.
Analysis of features revealed that the glottal source contains information that discriminates normal and pathological voice.
- Score: 18.80191660913831
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Automatic detection of voice pathology enables objective assessment and
earlier intervention for the diagnosis. This study provides a systematic
analysis of glottal source features and investigates their effectiveness in
voice pathology detection. Glottal source features are extracted using glottal
flows estimated with the quasi-closed phase (QCP) glottal inverse filtering
method, using approximate glottal source signals computed with the zero
frequency filtering (ZFF) method, and using acoustic voice signals directly. In
addition, we propose to derive mel-frequency cepstral coefficients (MFCCs) from
the glottal source waveforms computed by QCP and ZFF to effectively capture the
variations in glottal source spectra of pathological voice. Experiments were
carried out using two databases, the Hospital Universitario Principe de
Asturias (HUPA) database and the Saarbrucken Voice Disorders (SVD) database.
Analysis of features revealed that the glottal source contains information that
discriminates normal and pathological voice. Pathology detection experiments
were carried out using support vector machine (SVM). From the detection
experiments it was observed that the performance achieved with the studied
glottal source features is comparable or better than that of conventional MFCCs
and perceptual linear prediction (PLP) features. The best detection performance
was achieved when the glottal source features were combined with the
conventional MFCCs and PLP features, which indicates the complementary nature
of the features.
Related papers
- Multimodal Laryngoscopic Video Analysis for Assisted Diagnosis of Vocal Cord Paralysis [7.583632364503357]
The Multimodal Analyzing System for Laryngoscope (MASL) combines audio and video data to automatically extract key segments and metrics from laryngeal videostroboscopic videos for clinical assessment.
MASL integrates glottis detection with keyword spotting to analyze patient vocalizations and refine video highlights for better inspection of vocal cord movements.
arXiv Detail & Related papers (2024-09-05T14:56:38Z) - Enhancing dysarthria speech feature representation with empirical mode
decomposition and Walsh-Hadamard transform [8.032273183441921]
We propose a feature enhancement for dysarthria speech called WHFEMD.
It combines empirical mode decomposition (EMD) and fast Walsh-Hadamard transform (FWHT) to enhance features.
arXiv Detail & Related papers (2023-12-30T13:25:26Z) - Score-based Source Separation with Applications to Digital Communication
Signals [72.6570125649502]
We propose a new method for separating superimposed sources using diffusion-based generative models.
Motivated by applications in radio-frequency (RF) systems, we are interested in sources with underlying discrete nature.
Our method can be viewed as a multi-source extension to the recently proposed score distillation sampling scheme.
arXiv Detail & Related papers (2023-06-26T04:12:40Z) - Deep Spectro-temporal Artifacts for Detecting Synthesized Speech [57.42110898920759]
This paper provides an overall assessment of track 1 (Low-quality Fake Audio Detection) and track 2 (Partially Fake Audio Detection)
In this paper, spectro-temporal artifacts were detected using raw temporal signals, spectral features, as well as deep embedding features.
We ranked 4th and 5th in track 1 and track 2, respectively.
arXiv Detail & Related papers (2022-10-11T08:31:30Z) - Audio Deepfake Detection Based on a Combination of F0 Information and
Real Plus Imaginary Spectrogram Features [51.924340387119415]
Experimental results on the ASVspoof 2019 LA dataset show that our proposed system is very effective for the audio deepfake detection task.
Our proposed system is very effective for the audio deepfake detection task, achieving an equivalent error rate (EER) of 0.43%, which surpasses almost all systems.
arXiv Detail & Related papers (2022-08-02T02:46:16Z) - Iterative Sound Source Localization for Unknown Number of Sources [57.006589498243336]
We propose an iterative sound source localization approach called ISSL, which can iteratively extract each source's DOA without threshold until the termination criterion is met.
Our ISSL achieves significant performance improvements in both DOA estimation and source number detection compared with the existing threshold-based algorithms.
arXiv Detail & Related papers (2022-06-24T13:19:44Z) - SpecGrad: Diffusion Probabilistic Model based Neural Vocoder with
Adaptive Noise Spectral Shaping [51.698273019061645]
SpecGrad adapts the diffusion noise so that its time-varying spectral envelope becomes close to the conditioning log-mel spectrogram.
It is processed in the time-frequency domain to keep the computational cost almost the same as the conventional DDPM-based neural vocoders.
arXiv Detail & Related papers (2022-03-31T02:08:27Z) - Glottal source estimation robustness: A comparison of sensitivity of
voice source estimation techniques [11.97036509133719]
This paper addresses the problem of estimating the voice source directly from speech waveforms.
A novel principle based on Anticausality Dominated Regions (ACDR) is used to estimate the glottal open phase.
arXiv Detail & Related papers (2020-05-24T08:13:47Z) - Bulbar ALS Detection Based on Analysis of Voice Perturbation and Vibrato [68.97335984455059]
The purpose of this work was to verify the sutability of the sustain vowel phonation test for automatic detection of patients with ALS.
We proposed enhanced procedure for separation of voice signal into fundamental periods that requires for calculation of measurements.
arXiv Detail & Related papers (2020-03-24T12:49:25Z) - On the Mutual Information between Source and Filter Contributions for
Voice Pathology Detection [11.481208551940998]
This paper addresses the problem of automatic detection of voice pathologies directly from the speech signal.
Three sets of features are proposed, depending on whether they are related to the speech or the glottal signal, or to prosody.
arXiv Detail & Related papers (2020-01-02T10:04:37Z) - Glottal Source Processing: from Analysis to Applications [35.80742217666323]
glottal analysis from speech recordings requires specific and more complex processing operations.
This review gives a general overview of techniques which have been designed for glottal source processing.
arXiv Detail & Related papers (2019-12-29T08:13:58Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.