Automatic Estimation of Intelligibility Measure for Consonants in Speech
- URL: http://arxiv.org/abs/2005.06065v2
- Date: Sun, 28 Jun 2020 21:37:58 GMT
- Title: Automatic Estimation of Intelligibility Measure for Consonants in Speech
- Authors: Ali Abavisani and Mark Hasegawa-Johnson
- Abstract summary: We train regression models based on Convolutional Neural Networks (CNN) for stop consonants.
We estimate the corresponding Signal to Noise Ratio (SNR) at which the Consonant-Vowel (CV) sound becomes intelligible for Normal Hearing (NH) ears.
- Score: 44.02658023314131
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: In this article, we provide a model to estimate a real-valued measure of the
intelligibility of individual speech segments. We trained regression models
based on Convolutional Neural Networks (CNN) for stop consonants
\textipa{/p,t,k,b,d,g/} associated with vowel \textipa{/A/}, to estimate the
corresponding Signal to Noise Ratio (SNR) at which the Consonant-Vowel (CV)
sound becomes intelligible for Normal Hearing (NH) ears. The intelligibility
measure for each sound is called SNR$_{90}$, and is defined to be the SNR level
at which human participants are able to recognize the consonant at least 90\%
correctly, on average, as determined in prior experiments with NH subjects.
Performance of the CNN is compared to a baseline prediction based on automatic
speech recognition (ASR), specifically, a constant offset subtracted from the
SNR at which the ASR becomes capable of correctly labeling the consonant.
Compared to baseline, our models were able to accurately estimate the
SNR$_{90}$~intelligibility measure with less than 2 [dB$^2$] Mean Squared Error
(MSE) on average, while the baseline ASR-defined measure computes
SNR$_{90}$~with a variance of 5.2 to 26.6 [dB$^2$], depending on the consonant.
Related papers
- HAAQI-Net: A Non-intrusive Neural Music Audio Quality Assessment Model for Hearing Aids [30.305000305766193]
This paper introduces HAAQI-Net, a non-intrusive deep learning-based music audio quality assessment model for hearing aid users.
It can predict HAAQI scores directly from music audio clips and hearing loss patterns.
arXiv Detail & Related papers (2024-01-02T10:55:01Z) - Automatically measuring speech fluency in people with aphasia: first
achievements using read-speech data [55.84746218227712]
This study aims at assessing the relevance of a signalprocessingalgorithm, initially developed in the field of language acquisition, for the automatic measurement of speech fluency.
arXiv Detail & Related papers (2023-08-09T07:51:40Z) - UNSSOR: Unsupervised Neural Speech Separation by Leveraging
Over-determined Training Mixtures [60.879679764741624]
In reverberant conditions, each microphone acquires a mixture signal of multiple speakers at a different location.
We propose UNSSOR, an algorithm for $textbfu$nsupervised $textbfn$eural.
We show that this loss can promote unsupervised separation of speakers.
arXiv Detail & Related papers (2023-05-31T17:28:02Z) - Prediction of speech intelligibility with DNN-based performance measures [9.883633991083789]
This paper presents a speech intelligibility model based on automatic speech recognition (ASR)
It combines phoneme probabilities from deep neural networks (DNN) and a performance measure that estimates the word error rate from these probabilities.
The proposed model performs almost as well as the label-based model and produces more accurate predictions than the baseline models.
arXiv Detail & Related papers (2022-03-17T08:05:38Z) - A Conformer Based Acoustic Model for Robust Automatic Speech Recognition [63.242128956046024]
The proposed model builds on a state-of-the-art recognition system using a bi-directional long short-term memory (BLSTM) model with utterance-wise dropout and iterative speaker adaptation.
The Conformer encoder uses a convolution-augmented attention mechanism for acoustic modeling.
The proposed system is evaluated on the monaural ASR task of the CHiME-4 corpus.
arXiv Detail & Related papers (2022-03-01T20:17:31Z) - The Performance Evaluation of Attention-Based Neural ASR under Mixed
Speech Input [1.776746672434207]
We present mixtures of speech signals to a popular attention-based neural ASR, known as Listen, Attend, and Spell (LAS)
In particular, we investigate in details when two phonemes are mixed what will be the predicted phoneme.
Our results show the model, when presented with mixed phonemes signals, tend to predict those that have higher accuracies.
arXiv Detail & Related papers (2021-08-03T02:08:22Z) - Denoising Noisy Neural Networks: A Bayesian Approach with Compensation [36.39188653838991]
Noisy neural networks (NoisyNNs) refer to the inference and training of NNs in the presence of noise.
This paper studies how to estimate the uncontaminated NN weights from their noisy observations or manifestations.
arXiv Detail & Related papers (2021-05-22T11:51:20Z) - Deep Networks for Direction-of-Arrival Estimation in Low SNR [89.45026632977456]
We introduce a Convolutional Neural Network (CNN) that is trained from mutli-channel data of the true array manifold matrix.
We train a CNN in the low-SNR regime to predict DoAs across all SNRs.
Our robust solution can be applied in several fields, ranging from wireless array sensors to acoustic microphones or sonars.
arXiv Detail & Related papers (2020-11-17T12:52:18Z) - DNN-Based Semantic Model for Rescoring N-best Speech Recognition List [8.934497552812012]
The word error rate (WER) of an automatic speech recognition (ASR) system increases when a mismatch occurs between the training and the testing conditions due to the noise, etc.
This work aims to improve ASR by modeling long-term semantic relations to compensate for distorted acoustic features.
arXiv Detail & Related papers (2020-11-02T13:50:59Z) - Characterizing Speech Adversarial Examples Using Self-Attention U-Net
Enhancement [102.48582597586233]
We present a U-Net based attention model, U-Net$_At$, to enhance adversarial speech signals.
We conduct experiments on the automatic speech recognition (ASR) task with adversarial audio attacks.
arXiv Detail & Related papers (2020-03-31T02:16:34Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.