STOI-Net: A Deep Learning based Non-Intrusive Speech Intelligibility
Assessment Model
- URL: http://arxiv.org/abs/2011.04292v1
- Date: Mon, 9 Nov 2020 09:57:10 GMT
- Title: STOI-Net: A Deep Learning based Non-Intrusive Speech Intelligibility
Assessment Model
- Authors: Ryandhimas E. Zezario, Szu-Wei Fu, Chiou-Shann Fuh, Yu Tsao, Hsin-Min
Wang
- Abstract summary: We propose a deep learning-based non-intrusive speech intelligibility assessment model, namely STOI-Net.
The model is formed by the combination of a convolutional neural network and bidirectional long short-term memory (CNN-BLSTM) architecture with a multiplicative attention mechanism.
Experimental results show that the STOI score estimated by STOI-Net has a good correlation with the actual STOI score when tested with noisy and enhanced speech utterances.
- Score: 24.965732699885262
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: The calculation of most objective speech intelligibility assessment metrics
requires clean speech as a reference. Such a requirement may limit the
applicability of these metrics in real-world scenarios. To overcome this
limitation, we propose a deep learning-based non-intrusive speech
intelligibility assessment model, namely STOI-Net. The input and output of
STOI-Net are speech spectral features and predicted STOI scores, respectively.
The model is formed by the combination of a convolutional neural network and
bidirectional long short-term memory (CNN-BLSTM) architecture with a
multiplicative attention mechanism. Experimental results show that the STOI
score estimated by STOI-Net has a good correlation with the actual STOI score
when tested with noisy and enhanced speech utterances. The correlation values
are 0.97 and 0.83, respectively, for the seen test condition (the test speakers
and noise types are involved in the training set) and the unseen test condition
(the test speakers and noise types are not involved in the training set). The
results confirm the capability of STOI-Net to accurately predict the STOI
scores without referring to clean speech.
Related papers
- Evaluating Generative Language Models in Information Extraction as Subjective Question Correction [49.729908337372436]
We propose a new evaluation method, SQC-Score.
Inspired by the principles in subjective question correction, we propose a new evaluation method, SQC-Score.
Results on three information extraction tasks show that SQC-Score is more preferred by human annotators than the baseline metrics.
arXiv Detail & Related papers (2024-04-04T15:36:53Z) - Continual Learning for On-Device Speech Recognition using Disentangled
Conformers [54.32320258055716]
We introduce a continual learning benchmark for speaker-specific domain adaptation derived from LibriVox audiobooks.
We propose a novel compute-efficient continual learning algorithm called DisentangledCL.
Our experiments show that the DisConformer models significantly outperform baselines on general ASR.
arXiv Detail & Related papers (2022-12-02T18:58:51Z) - Discover, Explanation, Improvement: An Automatic Slice Detection
Framework for Natural Language Processing [72.14557106085284]
slice detection models (SDM) automatically identify underperforming groups of datapoints.
This paper proposes a benchmark named "Discover, Explain, improve (DEIM)" for classification NLP tasks.
Our evaluation shows that Edisa can accurately select error-prone datapoints with informative semantic features.
arXiv Detail & Related papers (2022-11-08T19:00:00Z) - MBI-Net: A Non-Intrusive Multi-Branched Speech Intelligibility
Prediction Model for Hearing Aids [22.736703635666164]
We propose a multi-branched speech intelligibility prediction model (MBI-Net) for predicting subjective intelligibility scores of hearing aid (HA) users.
The outputs of the two branches are fused through a linear layer to obtain predicted speech intelligibility scores.
arXiv Detail & Related papers (2022-04-07T09:13:44Z) - A Novel Speech Intelligibility Enhancement Model based on
CanonicalCorrelation and Deep Learning [12.913738983870621]
We present a canonical correlation based short-time objective intelligibility (CC-STOI) cost function to train a fully convolutional neural network (FCN) model.
We show that our CC-STOI based speech enhancement framework outperforms state-of-the-art DL models trained with conventional distance-based and STOI-based loss functions.
arXiv Detail & Related papers (2022-02-11T16:48:41Z) - HASA-net: A non-intrusive hearing-aid speech assessment network [52.83357278948373]
We propose a DNN-based hearing aid speech assessment network (HASA-Net) to predict speech quality and intelligibility scores simultaneously.
To the best of our knowledge, HASA-Net is the first work to incorporate quality and intelligibility assessments utilizing a unified DNN-based non-intrusive model for hearing aids.
Experimental results show that the predicted speech quality and intelligibility scores of HASA-Net are highly correlated to two well-known intrusive hearing-aid evaluation metrics.
arXiv Detail & Related papers (2021-11-10T14:10:13Z) - InQSS: a speech intelligibility assessment model using a multi-task
learning network [21.037410575414995]
In this study, we propose InQSS, a speech intelligibility assessment model that uses both spectrogram and scattering coefficients as input features.
The resulting model can predict not only the intelligibility scores but also the quality scores of a speech.
arXiv Detail & Related papers (2021-11-04T02:01:27Z) - LDNet: Unified Listener Dependent Modeling in MOS Prediction for
Synthetic Speech [67.88748572167309]
We present LDNet, a unified framework for mean opinion score (MOS) prediction.
We propose two inference methods that provide more stable results and efficient computation.
arXiv Detail & Related papers (2021-10-18T08:52:31Z) - SVSNet: An End-to-end Speaker Voice Similarity Assessment Model [61.3813595968834]
We propose SVSNet, the first end-to-end neural network model to assess the speaker voice similarity between natural speech and synthesized speech.
The experimental results on the Voice Conversion Challenge 2018 and 2020 show that SVSNet notably outperforms well-known baseline systems.
arXiv Detail & Related papers (2021-07-20T10:19:46Z) - Predicting speech intelligibility from EEG using a dilated convolutional
network [17.56832530408592]
We present a deep-learning-based model incorporating dilated convolutions that can be used to predict speech intelligibility without subject-specific training.
Our method is the first to predict the speech reception threshold from EEG for unseen subjects, contributing to objective measures of speech intelligibility.
arXiv Detail & Related papers (2021-05-14T14:12:52Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.