Detection of Glottal Closure Instants from Speech Signals: a
Quantitative Review
- URL: http://arxiv.org/abs/2001.00473v1
- Date: Sat, 28 Dec 2019 14:12:16 GMT
- Title: Detection of Glottal Closure Instants from Speech Signals: a
Quantitative Review
- Authors: Thomas Drugman, Mark Thomas, Jon Gudnason, Patrick Naylor, Thierry
Dutoit
- Abstract summary: Five state-of-the-art GCI detection algorithms are compared using six different databases.
The efficacy of these methods is first evaluated on clean speech, both in terms of reliabililty and accuracy.
It is shown that for clean speech, SEDREAMS and YAGA are the best performing techniques, both in terms of identification rate and accuracy.
- Score: 9.351195374919365
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: The pseudo-periodicity of voiced speech can be exploited in several speech
processing applications. This requires however that the precise locations of
the Glottal Closure Instants (GCIs) are available. The focus of this paper is
the evaluation of automatic methods for the detection of GCIs directly from the
speech waveform. Five state-of-the-art GCI detection algorithms are compared
using six different databases with contemporaneous electroglottographic
recordings as ground truth, and containing many hours of speech by multiple
speakers. The five techniques compared are the Hilbert Envelope-based detection
(HE), the Zero Frequency Resonator-based method (ZFR), the Dynamic Programming
Phase Slope Algorithm (DYPSA), the Speech Event Detection using the Residual
Excitation And a Mean-based Signal (SEDREAMS) and the Yet Another GCI Algorithm
(YAGA). The efficacy of these methods is first evaluated on clean speech, both
in terms of reliabililty and accuracy. Their robustness to additive noise and
to reverberation is also assessed. A further contribution of the paper is the
evaluation of their performance on a concrete application of speech processing:
the causal-anticausal decomposition of speech. It is shown that for clean
speech, SEDREAMS and YAGA are the best performing techniques, both in terms of
identification rate and accuracy. ZFR and SEDREAMS also show a superior
robustness to additive noise and reverberation.
Related papers
- MaskCycleGAN-based Whisper to Normal Speech Conversion [0.0]
We present a MaskCycleGAN approach for the conversion of whispered speech to normal speech.
We find that tuning the mask parameters, and pre-processing the signal with a voice activity detector provides superior performance.
arXiv Detail & Related papers (2024-08-27T06:07:18Z) - Speech Enhancement and Dereverberation with Diffusion-based Generative
Models [14.734454356396157]
We present a detailed overview of the diffusion process that is based on a differential equation.
We show that this procedure enables using only 30 diffusion steps to generate high-quality clean speech estimates.
In an extensive cross-dataset evaluation, we show that the improved method can compete with recent discriminative models.
arXiv Detail & Related papers (2022-08-11T13:55:12Z) - Speaker Embedding-aware Neural Diarization: a Novel Framework for
Overlapped Speech Diarization in the Meeting Scenario [51.5031673695118]
We reformulate overlapped speech diarization as a single-label prediction problem.
We propose the speaker embedding-aware neural diarization (SEND) system.
arXiv Detail & Related papers (2022-03-18T06:40:39Z) - Speech Enhancement for Wake-Up-Word detection in Voice Assistants [60.103753056973815]
Keywords spotting and in particular Wake-Up-Word (WUW) detection is a very important task for voice assistants.
This paper proposes a Speech Enhancement model adapted to the task of WUW detection.
It aims at increasing the recognition rate and reducing the false alarms in the presence of these types of noises.
arXiv Detail & Related papers (2021-01-29T18:44:05Z) - Gated Recurrent Fusion with Joint Training Framework for Robust
End-to-End Speech Recognition [64.9317368575585]
This paper proposes a gated recurrent fusion (GRF) method with joint training framework for robust end-to-end ASR.
The GRF algorithm is used to dynamically combine the noisy and enhanced features.
The proposed method achieves the relative character error rate (CER) reduction of 10.04% over the conventional joint enhancement and transformer method.
arXiv Detail & Related papers (2020-11-09T08:52:05Z) - Glottal source estimation robustness: A comparison of sensitivity of
voice source estimation techniques [11.97036509133719]
This paper addresses the problem of estimating the voice source directly from speech waveforms.
A novel principle based on Anticausality Dominated Regions (ACDR) is used to estimate the glottal open phase.
arXiv Detail & Related papers (2020-05-24T08:13:47Z) - Simultaneous Denoising and Dereverberation Using Deep Embedding Features [64.58693911070228]
We propose a joint training method for simultaneous speech denoising and dereverberation using deep embedding features.
At the denoising stage, the DC network is leveraged to extract noise-free deep embedding features.
At the dereverberation stage, instead of using the unsupervised K-means clustering algorithm, another neural network is utilized to estimate the anechoic speech.
arXiv Detail & Related papers (2020-04-06T06:34:01Z) - End-to-End Automatic Speech Recognition Integrated With CTC-Based Voice
Activity Detection [48.80449801938696]
This paper integrates a voice activity detection function with end-to-end automatic speech recognition.
We focus on connectionist temporal classification ( CTC) and its extension ofsynchronous/attention.
We use the labels as a cue for detecting speech segments with simple thresholding.
arXiv Detail & Related papers (2020-02-03T03:36:34Z) - Continuous speech separation: dataset and analysis [52.10378896407332]
In natural conversations, a speech signal is continuous, containing both overlapped and overlap-free components.
This paper describes a dataset and protocols for evaluating continuous speech separation algorithms.
arXiv Detail & Related papers (2020-01-30T18:01:31Z) - Glottal Closure and Opening Instant Detection from Speech Signals [13.563526970105988]
This paper proposes a new procedure to detect Glottal Closure and Opening Instants (GCIs and GOIs) directly from speech waveforms.
The proposed method is compared to the DYPSA algorithm on the CMU ARCTIC database.
arXiv Detail & Related papers (2019-12-28T19:27:45Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.