Glottal source estimation robustness: A comparison of sensitivity of
voice source estimation techniques
- URL: http://arxiv.org/abs/2005.11682v1
- Date: Sun, 24 May 2020 08:13:47 GMT
- Title: Glottal source estimation robustness: A comparison of sensitivity of
voice source estimation techniques
- Authors: Thomas Drugman, Thomas Dubuisson, Alexis Moinet, Nicolas D'Alessandro,
Thierry Dutoit
- Abstract summary: This paper addresses the problem of estimating the voice source directly from speech waveforms.
A novel principle based on Anticausality Dominated Regions (ACDR) is used to estimate the glottal open phase.
- Score: 11.97036509133719
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: This paper addresses the problem of estimating the voice source directly from
speech waveforms. A novel principle based on Anticausality Dominated Regions
(ACDR) is used to estimate the glottal open phase. This technique is compared
to two other state-of-the-art well-known methods, namely the Zeros of the
Z-Transform (ZZT) and the Iterative Adaptive Inverse Filtering (IAIF)
algorithms. Decomposition quality is assessed on synthetic signals through two
objective measures: the spectral distortion and a glottal formant determination
rate. Technique robustness is tested by analyzing the influence of noise and
Glottal Closure Instant (GCI) location errors. Besides impacts of the
fundamental frequency and the first formant on the performance are evaluated.
Our proposed approach shows significant improvement in robustness, which could
be of a great interest when decomposing real speech.
Related papers
- Quantifying Noise of Dynamic Vision Sensor [49.665407116447454]
Dynamic visual sensors (DVS) are characterised by a large amount of background activity (BA) noise.
It is difficult to distinguish between noise and the cleaned sensor signals using standard image processing techniques.
A new technique is presented to characterise BA noise derived from the Detrended Fluctuation Analysis (DFA)
arXiv Detail & Related papers (2024-04-02T13:43:08Z) - Enhancing dysarthria speech feature representation with empirical mode
decomposition and Walsh-Hadamard transform [8.032273183441921]
We propose a feature enhancement for dysarthria speech called WHFEMD.
It combines empirical mode decomposition (EMD) and fast Walsh-Hadamard transform (FWHT) to enhance features.
arXiv Detail & Related papers (2023-12-30T13:25:26Z) - Partial Identification with Noisy Covariates: A Robust Optimization
Approach [94.10051154390237]
Causal inference from observational datasets often relies on measuring and adjusting for covariates.
We show that this robust optimization approach can extend a wide range of causal adjustment methods to perform partial identification.
Across synthetic and real datasets, we find that this approach provides ATE bounds with a higher coverage probability than existing methods.
arXiv Detail & Related papers (2022-02-22T04:24:26Z) - Time-domain Speech Enhancement with Generative Adversarial Learning [53.74228907273269]
This paper proposes a new framework called Time-domain Speech Enhancement Generative Adversarial Network (TSEGAN)
TSEGAN is an extension of the generative adversarial network (GAN) in time-domain with metric evaluation to mitigate the scaling problem.
In addition, we provide a new method based on objective function mapping for the theoretical analysis of the performance of Metric GAN.
arXiv Detail & Related papers (2021-03-30T08:09:49Z) - Gated Recurrent Fusion with Joint Training Framework for Robust
End-to-End Speech Recognition [64.9317368575585]
This paper proposes a gated recurrent fusion (GRF) method with joint training framework for robust end-to-end ASR.
The GRF algorithm is used to dynamically combine the noisy and enhanced features.
The proposed method achieves the relative character error rate (CER) reduction of 10.04% over the conventional joint enhancement and transformer method.
arXiv Detail & Related papers (2020-11-09T08:52:05Z) - Class-Conditional Defense GAN Against End-to-End Speech Attacks [82.21746840893658]
We propose a novel approach against end-to-end adversarial attacks developed to fool advanced speech-to-text systems such as DeepSpeech and Lingvo.
Unlike conventional defense approaches, the proposed approach does not directly employ low-level transformations such as autoencoding a given input signal.
Our defense-GAN considerably outperforms conventional defense algorithms in terms of word error rate and sentence level recognition accuracy.
arXiv Detail & Related papers (2020-10-22T00:02:02Z) - Real Time Speech Enhancement in the Waveform Domain [99.02180506016721]
We present a causal speech enhancement model working on the raw waveform that runs in real-time on a laptop CPU.
The proposed model is based on an encoder-decoder architecture with skip-connections.
It is capable of removing various kinds of background noise including stationary and non-stationary noises.
arXiv Detail & Related papers (2020-06-23T09:19:13Z) - A Comparative Study of Glottal Source Estimation Techniques [11.481208551940998]
Source-tract decomposition (or glottal flow estimation) is one of the basic problems of speech processing.
In this study we compare three of the main representative state-of-the-art methods of glottal flow estimation.
arXiv Detail & Related papers (2019-12-28T20:40:08Z) - Detection of Glottal Closure Instants from Speech Signals: a
Quantitative Review [9.351195374919365]
Five state-of-the-art GCI detection algorithms are compared using six different databases.
The efficacy of these methods is first evaluated on clean speech, both in terms of reliabililty and accuracy.
It is shown that for clean speech, SEDREAMS and YAGA are the best performing techniques, both in terms of identification rate and accuracy.
arXiv Detail & Related papers (2019-12-28T14:12:16Z) - Joint Robust Voicing Detection and Pitch Estimation Based on Residual
Harmonics [23.523461173865737]
The proposed criterion is used both for pitch estimation, as well as for determining the voicing segments of speech.
The technique is shown to be particularly robust to additive noise, leading to a significant improvement in adverse conditions.
arXiv Detail & Related papers (2019-12-28T13:45:29Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.