Glottal Closure and Opening Instant Detection from Speech Signals
- URL: http://arxiv.org/abs/2001.00841v1
- Date: Sat, 28 Dec 2019 19:27:45 GMT
- Title: Glottal Closure and Opening Instant Detection from Speech Signals
- Authors: Thomas Drugman, Thierry Dutoit
- Abstract summary: This paper proposes a new procedure to detect Glottal Closure and Opening Instants (GCIs and GOIs) directly from speech waveforms.
The proposed method is compared to the DYPSA algorithm on the CMU ARCTIC database.
- Score: 13.563526970105988
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: This paper proposes a new procedure to detect Glottal Closure and Opening
Instants (GCIs and GOIs) directly from speech waveforms. The procedure is
divided into two successive steps. First a mean-based signal is computed, and
intervals where speech events are expected to occur are extracted from it.
Secondly, at each interval a precise position of the speech event is assigned
by locating a discontinuity in the Linear Prediction residual. The proposed
method is compared to the DYPSA algorithm on the CMU ARCTIC database. A
significant improvement as well as a better noise robustness are reported.
Besides, results of GOI identification accuracy are promising for the glottal
source characterization.
Related papers
- End-to-End Integration of Speech Separation and Voice Activity Detection for Low-Latency Diarization of Telephone Conversations [13.020158123538138]
Speech separation guided diarization (SSGD) performs diarization by first separating the speakers and then applying voice activity detection (VAD) on each separated stream.
We consider three state-of-the-art speech separation (SSep) algorithms and study their performance in online and offline scenarios.
We show that our best model achieves 8.8% DER on CALLHOME, which outperforms the current state-of-the-art end-to-end neural diarization model.
arXiv Detail & Related papers (2023-03-21T16:33:56Z) - Speaker Embedding-aware Neural Diarization: a Novel Framework for
Overlapped Speech Diarization in the Meeting Scenario [51.5031673695118]
We reformulate overlapped speech diarization as a single-label prediction problem.
We propose the speaker embedding-aware neural diarization (SEND) system.
arXiv Detail & Related papers (2022-03-18T06:40:39Z) - SoundDet: Polyphonic Sound Event Detection and Localization from Raw
Waveform [48.68714598985078]
SoundDet is an end-to-end trainable and light-weight framework for polyphonic moving sound event detection and localization.
SoundDet directly consumes the raw, multichannel waveform and treats the temporal sound event as a complete sound-object" to be detected.
A dense sound proposal event map is then constructed to handle the challenges of predicting events with large varying temporal duration.
arXiv Detail & Related papers (2021-06-13T11:43:41Z) - Composably secure data processing for Gaussian-modulated continuous
variable quantum key distribution [58.720142291102135]
Continuous-variable quantum key distribution (QKD) employs the quadratures of a bosonic mode to establish a secret key between two remote parties.
We consider a protocol with homodyne detection in the general setting of composable finite-size security.
In particular, we analyze the high signal-to-noise regime which requires the use of high-rate (non-binary) low-density parity check codes.
arXiv Detail & Related papers (2021-03-30T18:02:55Z) - Multi-Discriminator Sobolev Defense-GAN Against Adversarial Attacks for
End-to-End Speech Systems [78.5097679815944]
This paper introduces a defense approach against end-to-end adversarial attacks developed for cutting-edge speech-to-text systems.
First, we represent speech signals with 2D spectrograms using the short-time Fourier transform.
Second, we iteratively find a safe vector using a spectrogram subspace projection operation.
Third, we synthesize a spectrogram with such a safe vector using a novel GAN architecture trained with Sobolev integral probability metric.
arXiv Detail & Related papers (2021-03-15T01:11:13Z) - Optimal Sequential Detection of Signals with Unknown Appearance and
Disappearance Points in Time [64.26593350748401]
The paper addresses a sequential changepoint detection problem, assuming that the duration of change may be finite and unknown.
We focus on a reliable maximin change detection criterion of maximizing the minimal probability of detection in a given time (or space) window.
The FMA algorithm is applied to detecting faint streaks of satellites in optical images.
arXiv Detail & Related papers (2021-02-02T04:58:57Z) - Glottal source estimation robustness: A comparison of sensitivity of
voice source estimation techniques [11.97036509133719]
This paper addresses the problem of estimating the voice source directly from speech waveforms.
A novel principle based on Anticausality Dominated Regions (ACDR) is used to estimate the glottal open phase.
arXiv Detail & Related papers (2020-05-24T08:13:47Z) - End-to-End Automatic Speech Recognition Integrated With CTC-Based Voice
Activity Detection [48.80449801938696]
This paper integrates a voice activity detection function with end-to-end automatic speech recognition.
We focus on connectionist temporal classification ( CTC) and its extension ofsynchronous/attention.
We use the labels as a cue for detecting speech segments with simple thresholding.
arXiv Detail & Related papers (2020-02-03T03:36:34Z) - Detection of Glottal Closure Instants from Speech Signals: a
Quantitative Review [9.351195374919365]
Five state-of-the-art GCI detection algorithms are compared using six different databases.
The efficacy of these methods is first evaluated on clean speech, both in terms of reliabililty and accuracy.
It is shown that for clean speech, SEDREAMS and YAGA are the best performing techniques, both in terms of identification rate and accuracy.
arXiv Detail & Related papers (2019-12-28T14:12:16Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.