A Comparative Study of Glottal Source Estimation Techniques
- URL: http://arxiv.org/abs/2001.00840v1
- Date: Sat, 28 Dec 2019 20:40:08 GMT
- Title: A Comparative Study of Glottal Source Estimation Techniques
- Authors: Thomas Drugman, Baris Bozkurt, Thierry Dutoit
- Abstract summary: Source-tract decomposition (or glottal flow estimation) is one of the basic problems of speech processing.
In this study we compare three of the main representative state-of-the-art methods of glottal flow estimation.
- Score: 11.481208551940998
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Source-tract decomposition (or glottal flow estimation) is one of the basic
problems of speech processing. For this, several techniques have been proposed
in the literature. However studies comparing different approaches are almost
nonexistent. Besides, experiments have been systematically performed either on
synthetic speech or on sustained vowels. In this study we compare three of the
main representative state-of-the-art methods of glottal flow estimation:
closed-phase inverse filtering, iterative and adaptive inverse filtering, and
mixed-phase decomposition. These techniques are first submitted to an objective
assessment test on synthetic speech signals. Their sensitivity to various
factors affecting the estimation quality, as well as their robustness to noise
are studied. In a second experiment, their ability to label voice quality
(tensed, modal, soft) is studied on a large corpus of real connected speech. It
is shown that changes of voice quality are reflected by significant
modifications in glottal feature distributions. Techniques based on the
mixed-phase decomposition and on a closed-phase inverse filtering process turn
out to give the best results on both clean synthetic and real speech signals.
On the other hand, iterative and adaptive inverse filtering is recommended in
noisy environments for its high robustness.
Related papers
- ScalingFilter: Assessing Data Quality through Inverse Utilization of Scaling Laws [67.59263833387536]
ScalingFilter is a novel approach that evaluates text quality based on the perplexity difference between two language models trained on the same data.
To assess the bias introduced by quality filtering, we introduce semantic diversity, a metric of utilizing text embedding models for semantic representations.
arXiv Detail & Related papers (2024-08-15T17:59:30Z) - Boosting Fast and High-Quality Speech Synthesis with Linear Diffusion [85.54515118077825]
This paper proposes a linear diffusion model (LinDiff) based on an ordinary differential equation to simultaneously reach fast inference and high sample quality.
To reduce computational complexity, LinDiff employs a patch-based processing approach that partitions the input signal into small patches.
Our model can synthesize speech of a quality comparable to that of autoregressive models with faster synthesis speed.
arXiv Detail & Related papers (2023-06-09T07:02:43Z) - Robust Vocal Quality Feature Embeddings for Dysphonic Voice Detection [22.413475757518682]
We propose a deep learning framework for generating acoustic feature embeddings sensitive to vocal quality.
A contrastive loss is combined with a classification loss to train our deep learning model jointly.
Empirical results demonstrate that our method achieves high in-corpus and cross-corpus classification accuracy.
arXiv Detail & Related papers (2022-11-17T19:34:59Z) - TranSpeech: Speech-to-Speech Translation With Bilateral Perturbation [61.564874831498145]
TranSpeech is a speech-to-speech translation model with bilateral perturbation.
We establish a non-autoregressive S2ST technique, which repeatedly masks and predicts unit choices.
TranSpeech shows a significant improvement in inference latency, enabling speedup up to 21.4x than autoregressive technique.
arXiv Detail & Related papers (2022-05-25T06:34:14Z) - Discretization and Re-synthesis: an alternative method to solve the
Cocktail Party Problem [65.25725367771075]
This study demonstrates, for the first time, that the synthesis-based approach can also perform well on this problem.
Specifically, we propose a novel speech separation/enhancement model based on the recognition of discrete symbols.
By utilizing the synthesis model with the input of discrete symbols, after the prediction of discrete symbol sequence, each target speech could be re-synthesized.
arXiv Detail & Related papers (2021-12-17T08:35:40Z) - Investigations on Audiovisual Emotion Recognition in Noisy Conditions [43.40644186593322]
We present an investigation on two emotion datasets with superimposed noise at different signal-to-noise ratios.
The results show a significant performance decrease when a model trained on clean audio is applied to noisy data.
arXiv Detail & Related papers (2021-03-02T17:45:16Z) - WaveTransform: Crafting Adversarial Examples via Input Decomposition [69.01794414018603]
We introduce WaveTransform', that creates adversarial noise corresponding to low-frequency and high-frequency subbands, separately (or in combination)
Experiments show that the proposed attack is effective against the defense algorithm and is also transferable across CNNs.
arXiv Detail & Related papers (2020-10-29T17:16:59Z) - Real Time Speech Enhancement in the Waveform Domain [99.02180506016721]
We present a causal speech enhancement model working on the raw waveform that runs in real-time on a laptop CPU.
The proposed model is based on an encoder-decoder architecture with skip-connections.
It is capable of removing various kinds of background noise including stationary and non-stationary noises.
arXiv Detail & Related papers (2020-06-23T09:19:13Z) - Parametric Representation for Singing Voice Synthesis: a Comparative
Evaluation [10.37199090634032]
The goal of this paper is twofold. First, a comparative subjective evaluation is performed across four existing techniques suitable for statistical parametric synthesis.
The artifacts occurring in high-pitched voices are discussed and possible approaches to overcome them are suggested.
arXiv Detail & Related papers (2020-06-07T13:06:30Z) - Maximum Voiced Frequency Estimation: Exploiting Amplitude and Phase
Spectra [22.675699190161417]
This paper proposes a new approach for MVF estimation which exploits both amplitude and phase spectra.
It is shown that phase conveys relevant information about the harmonicity of the voice signal, and that it can be jointly used with features derived from the amplitude spectrum.
The proposed technique is compared to two state-of-the-art methods, and shows a superior performance in both objective and subjective evaluations.
arXiv Detail & Related papers (2020-05-31T13:40:46Z) - Glottal source estimation robustness: A comparison of sensitivity of
voice source estimation techniques [11.97036509133719]
This paper addresses the problem of estimating the voice source directly from speech waveforms.
A novel principle based on Anticausality Dominated Regions (ACDR) is used to estimate the glottal open phase.
arXiv Detail & Related papers (2020-05-24T08:13:47Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.