MBI-Net: A Non-Intrusive Multi-Branched Speech Intelligibility
Prediction Model for Hearing Aids
- URL: http://arxiv.org/abs/2204.03305v1
- Date: Thu, 7 Apr 2022 09:13:44 GMT
- Title: MBI-Net: A Non-Intrusive Multi-Branched Speech Intelligibility
Prediction Model for Hearing Aids
- Authors: Ryandhimas E. Zezario, Fei Chen, Chiou-Shann Fuh, Hsin-Min Wang, Yu
Tsao
- Abstract summary: We propose a multi-branched speech intelligibility prediction model (MBI-Net) for predicting subjective intelligibility scores of hearing aid (HA) users.
The outputs of the two branches are fused through a linear layer to obtain predicted speech intelligibility scores.
- Score: 22.736703635666164
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Improving the user's hearing ability to understand speech in noisy
environments is critical to the development of hearing aid (HA) devices. For
this, it is important to derive a metric that can fairly predict speech
intelligibility for HA users. A straightforward approach is to conduct a
subjective listening test and use the test results as an evaluation metric.
However, conducting large-scale listening tests is time-consuming and
expensive. Therefore, several evaluation metrics were derived as surrogates for
subjective listening test results. In this study, we propose a multi-branched
speech intelligibility prediction model (MBI-Net), for predicting the
subjective intelligibility scores of HA users. MBI-Net consists of two branches
of models, with each branch consisting of a hearing loss model, a cross-domain
feature extraction module, and a speech intelligibility prediction model, to
process speech signals from one channel. The outputs of the two branches are
fused through a linear layer to obtain predicted speech intelligibility scores.
Experimental results confirm the effectiveness of MBI-Net, which produces
higher prediction scores than the baseline system in Track 1 and Track 2 on the
Clarity Prediction Challenge 2022 dataset.
Related papers
- DASA: Difficulty-Aware Semantic Augmentation for Speaker Verification [55.306583814017046]
We present a novel difficulty-aware semantic augmentation (DASA) approach for speaker verification.
DASA generates diversified training samples in speaker embedding space with negligible extra computing cost.
The best result achieves a 14.6% relative reduction in EER metric on CN-Celeb evaluation set.
arXiv Detail & Related papers (2023-10-18T17:07:05Z) - Non-Intrusive Speech Intelligibility Prediction for Hearing Aids using Whisper and Metadata [28.260347585185176]
We present three novel methods to improve intelligibility prediction accuracy.
MBI-Net+ is an enhanced version of MBI-Net, the top-performing system in the 1st Clarity Prediction Challenge.
arXiv Detail & Related papers (2023-09-18T07:51:09Z) - Speaker Embedding-aware Neural Diarization: a Novel Framework for
Overlapped Speech Diarization in the Meeting Scenario [51.5031673695118]
We reformulate overlapped speech diarization as a single-label prediction problem.
We propose the speaker embedding-aware neural diarization (SEND) system.
arXiv Detail & Related papers (2022-03-18T06:40:39Z) - HASA-net: A non-intrusive hearing-aid speech assessment network [52.83357278948373]
We propose a DNN-based hearing aid speech assessment network (HASA-Net) to predict speech quality and intelligibility scores simultaneously.
To the best of our knowledge, HASA-Net is the first work to incorporate quality and intelligibility assessments utilizing a unified DNN-based non-intrusive model for hearing aids.
Experimental results show that the predicted speech quality and intelligibility scores of HASA-Net are highly correlated to two well-known intrusive hearing-aid evaluation metrics.
arXiv Detail & Related papers (2021-11-10T14:10:13Z) - Deep Learning-based Non-Intrusive Multi-Objective Speech Assessment
Model with Cross-Domain Features [30.57631206882462]
The MOSA-Net is designed to estimate speech quality, intelligibility, and distortion assessment scores based on a test speech signal as input.
We show that the MOSA-Net can precisely predict perceptual evaluation of speech quality (PESQ), short-time objective intelligibility (STOI), and speech distortion index (BLS) scores when tested on both noisy and enhanced speech utterances.
arXiv Detail & Related papers (2021-11-03T17:30:43Z) - LDNet: Unified Listener Dependent Modeling in MOS Prediction for
Synthetic Speech [67.88748572167309]
We present LDNet, a unified framework for mean opinion score (MOS) prediction.
We propose two inference methods that provide more stable results and efficient computation.
arXiv Detail & Related papers (2021-10-18T08:52:31Z) - Predicting speech intelligibility from EEG using a dilated convolutional
network [17.56832530408592]
We present a deep-learning-based model incorporating dilated convolutions that can be used to predict speech intelligibility without subject-specific training.
Our method is the first to predict the speech reception threshold from EEG for unseen subjects, contributing to objective measures of speech intelligibility.
arXiv Detail & Related papers (2021-05-14T14:12:52Z) - Characterizing Speech Adversarial Examples Using Self-Attention U-Net
Enhancement [102.48582597586233]
We present a U-Net based attention model, U-Net$_At$, to enhance adversarial speech signals.
We conduct experiments on the automatic speech recognition (ASR) task with adversarial audio attacks.
arXiv Detail & Related papers (2020-03-31T02:16:34Z) - Deep Speaker Embeddings for Far-Field Speaker Recognition on Short
Utterances [53.063441357826484]
Speaker recognition systems based on deep speaker embeddings have achieved significant performance in controlled conditions.
Speaker verification on short utterances in uncontrolled noisy environment conditions is one of the most challenging and highly demanded tasks.
This paper presents approaches aimed to achieve two goals: a) improve the quality of far-field speaker verification systems in the presence of environmental noise, reverberation and b) reduce the system qualitydegradation for short utterances.
arXiv Detail & Related papers (2020-02-14T13:34:33Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.