HASA-net: A non-intrusive hearing-aid speech assessment network
- URL: http://arxiv.org/abs/2111.05691v1
- Date: Wed, 10 Nov 2021 14:10:13 GMT
- Title: HASA-net: A non-intrusive hearing-aid speech assessment network
- Authors: Hsin-Tien Chiang, Yi-Chiao Wu, Cheng Yu, Tomoki Toda, Hsin-Min Wang,
Yih-Chun Hu, Yu Tsao
- Abstract summary: We propose a DNN-based hearing aid speech assessment network (HASA-Net) to predict speech quality and intelligibility scores simultaneously.
To the best of our knowledge, HASA-Net is the first work to incorporate quality and intelligibility assessments utilizing a unified DNN-based non-intrusive model for hearing aids.
Experimental results show that the predicted speech quality and intelligibility scores of HASA-Net are highly correlated to two well-known intrusive hearing-aid evaluation metrics.
- Score: 52.83357278948373
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Without the need of a clean reference, non-intrusive speech assessment
methods have caught great attention for objective evaluations. Recently, deep
neural network (DNN) models have been applied to build non-intrusive speech
assessment approaches and confirmed to provide promising performance. However,
most DNN-based approaches are designed for normal-hearing listeners without
considering hearing-loss factors. In this study, we propose a DNN-based hearing
aid speech assessment network (HASA-Net), formed by a bidirectional long
short-term memory (BLSTM) model, to predict speech quality and intelligibility
scores simultaneously according to input speech signals and specified
hearing-loss patterns. To the best of our knowledge, HASA-Net is the first work
to incorporate quality and intelligibility assessments utilizing a unified
DNN-based non-intrusive model for hearing aids. Experimental results show that
the predicted speech quality and intelligibility scores of HASA-Net are highly
correlated to two well-known intrusive hearing-aid evaluation metrics, hearing
aid speech quality index (HASQI) and hearing aid speech perception index
(HASPI), respectively.
Related papers
- sVAD: A Robust, Low-Power, and Light-Weight Voice Activity Detection
with Spiking Neural Networks [51.516451451719654]
Spiking Neural Networks (SNNs) are known to be biologically plausible and power-efficient.
This paper introduces a novel SNN-based Voice Activity Detection model, referred to as sVAD.
It provides effective auditory feature representation through SincNet and 1D convolution, and improves noise robustness with attention mechanisms.
arXiv Detail & Related papers (2024-03-09T02:55:44Z) - Non-Intrusive Speech Intelligibility Prediction for Hearing-Impaired
Users using Intermediate ASR Features and Human Memory Models [29.511898279006175]
This work combines the use ofWhisper ASR decoder layer representations as neural network input features with an exemplar-based, psychologically motivated model of human memory to predict human intelligibility ratings for hearing-aid users.
Substantial performance improvement over an established intrusive HASPI baseline system is found, including on enhancement systems and listeners unseen in the training data, with a root mean squared error of 25.3 compared with the baseline of 28.7.
arXiv Detail & Related papers (2024-01-24T17:31:07Z) - Spiking-LEAF: A Learnable Auditory front-end for Spiking Neural Networks [53.31894108974566]
Spiking-LEAF is a learnable auditory front-end meticulously designed for SNN-based speech processing.
On keyword spotting and speaker identification tasks, the proposed Spiking-LEAF outperforms both SOTA spiking auditory front-ends.
arXiv Detail & Related papers (2023-09-18T04:03:05Z) - MBI-Net: A Non-Intrusive Multi-Branched Speech Intelligibility
Prediction Model for Hearing Aids [22.736703635666164]
We propose a multi-branched speech intelligibility prediction model (MBI-Net) for predicting subjective intelligibility scores of hearing aid (HA) users.
The outputs of the two branches are fused through a linear layer to obtain predicted speech intelligibility scores.
arXiv Detail & Related papers (2022-04-07T09:13:44Z) - Deep Learning-based Non-Intrusive Multi-Objective Speech Assessment
Model with Cross-Domain Features [30.57631206882462]
The MOSA-Net is designed to estimate speech quality, intelligibility, and distortion assessment scores based on a test speech signal as input.
We show that the MOSA-Net can precisely predict perceptual evaluation of speech quality (PESQ), short-time objective intelligibility (STOI), and speech distortion index (BLS) scores when tested on both noisy and enhanced speech utterances.
arXiv Detail & Related papers (2021-11-03T17:30:43Z) - LDNet: Unified Listener Dependent Modeling in MOS Prediction for
Synthetic Speech [67.88748572167309]
We present LDNet, a unified framework for mean opinion score (MOS) prediction.
We propose two inference methods that provide more stable results and efficient computation.
arXiv Detail & Related papers (2021-10-18T08:52:31Z) - Characterizing Speech Adversarial Examples Using Self-Attention U-Net
Enhancement [102.48582597586233]
We present a U-Net based attention model, U-Net$_At$, to enhance adversarial speech signals.
We conduct experiments on the automatic speech recognition (ASR) task with adversarial audio attacks.
arXiv Detail & Related papers (2020-03-31T02:16:34Z) - Deep Speaker Embeddings for Far-Field Speaker Recognition on Short
Utterances [53.063441357826484]
Speaker recognition systems based on deep speaker embeddings have achieved significant performance in controlled conditions.
Speaker verification on short utterances in uncontrolled noisy environment conditions is one of the most challenging and highly demanded tasks.
This paper presents approaches aimed to achieve two goals: a) improve the quality of far-field speaker verification systems in the presence of environmental noise, reverberation and b) reduce the system qualitydegradation for short utterances.
arXiv Detail & Related papers (2020-02-14T13:34:33Z) - Stable Training of DNN for Speech Enhancement based on
Perceptually-Motivated Black-Box Cost Function [39.66350526759246]
Methods related to subjective sound quality assessment (OSQA) have been proposed such as PESQ (perceptual evaluation of speech quality)
Direct use of such measures for training deep neural network (DNN) is not allowed in most cases because popular OSQAs are non-differentiable with respect to DNN parameters.
We propose to use stabilization techniques borrowed from reinforcement learning to increase the score of PESQ.
arXiv Detail & Related papers (2020-02-14T05:44:17Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.