Non Intrusive Intelligibility Predictor for Hearing Impaired Individuals
using Self Supervised Speech Representations
- URL: http://arxiv.org/abs/2307.13423v3
- Date: Thu, 7 Dec 2023 11:39:58 GMT
- Title: Non Intrusive Intelligibility Predictor for Hearing Impaired Individuals
using Self Supervised Speech Representations
- Authors: George Close, Thomas Hain, Stefan Goetze
- Abstract summary: techniques for non-intrusive prediction of SQ ratings are extended to the prediction of intelligibility for hearing-impaired users.
It is found that self-supervised representations are useful as input features to non-intrusive prediction models.
- Score: 21.237026538221404
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Self-supervised speech representations (SSSRs) have been successfully applied
to a number of speech-processing tasks, e.g. as feature extractor for speech
quality (SQ) prediction, which is, in turn, relevant for assessment and
training speech enhancement systems for users with normal or impaired hearing.
However, exact knowledge of why and how quality-related information is encoded
well in such representations remains poorly understood. In this work,
techniques for non-intrusive prediction of SQ ratings are extended to the
prediction of intelligibility for hearing-impaired users. It is found that
self-supervised representations are useful as input features to non-intrusive
prediction models, achieving competitive performance to more complex systems. A
detailed analysis of the performance depending on Clarity Prediction Challenge
1 listeners and enhancement systems indicates that more data might be needed to
allow generalisation to unknown systems and (hearing-impaired) individuals
Related papers
- Training-Free Deepfake Voice Recognition by Leveraging Large-Scale Pre-Trained Models [52.04189118767758]
Generalization is a main issue for current audio deepfake detectors.
In this paper we study the potential of large-scale pre-trained models for audio deepfake detection.
arXiv Detail & Related papers (2024-05-03T15:27:11Z) - Third-Party Language Model Performance Prediction from Instruction [59.574169249307054]
Language model-based instruction-following systems have lately shown increasing performance on many benchmark tasks.
A user may easily prompt a model with an instruction without any idea of whether the responses should be expected to be accurate.
We propose a third party performance prediction framework, where a separate model is trained to predict the metric resulting from evaluating an instruction-following system on a task.
arXiv Detail & Related papers (2024-03-19T03:53:47Z) - Self-Supervised Speech Quality Estimation and Enhancement Using Only
Clean Speech [50.95292368372455]
We propose VQScore, a self-supervised metric for evaluating speech based on the quantization error of a vector-quantized-variational autoencoder (VQ-VAE)
The training of VQ-VAE relies on clean speech; hence, large quantization errors can be expected when the speech is distorted.
We found that the vector quantization mechanism could also be used for self-supervised speech enhancement (SE) model training.
arXiv Detail & Related papers (2024-02-26T06:01:38Z) - Non-Intrusive Speech Intelligibility Prediction for Hearing-Impaired
Users using Intermediate ASR Features and Human Memory Models [29.511898279006175]
This work combines the use ofWhisper ASR decoder layer representations as neural network input features with an exemplar-based, psychologically motivated model of human memory to predict human intelligibility ratings for hearing-aid users.
Substantial performance improvement over an established intrusive HASPI baseline system is found, including on enhancement systems and listeners unseen in the training data, with a root mean squared error of 25.3 compared with the baseline of 28.7.
arXiv Detail & Related papers (2024-01-24T17:31:07Z) - Personalized Predictive ASR for Latency Reduction in Voice Assistants [29.237198363254752]
We introduce predictive automatic speech recognition, where we predict the full utterance from a partially observed utterance, and prefetch the response based on the predicted utterance.
We evaluate our methods on an internal voice assistant dataset as well as the public SLURP dataset.
arXiv Detail & Related papers (2023-05-23T08:05:43Z) - Perceive and predict: self-supervised speech representation based loss
functions for speech enhancement [23.974815078687445]
It is shown that the distance between the feature encodings of clean and noisy speech correlate strongly with psychoacoustically motivated measures of speech quality and intelligibility.
Experiments using this distance as a loss function are performed and improved performance over the use of STFT spectrogram distance based loss.
arXiv Detail & Related papers (2023-01-11T10:20:56Z) - MBI-Net: A Non-Intrusive Multi-Branched Speech Intelligibility
Prediction Model for Hearing Aids [22.736703635666164]
We propose a multi-branched speech intelligibility prediction model (MBI-Net) for predicting subjective intelligibility scores of hearing aid (HA) users.
The outputs of the two branches are fused through a linear layer to obtain predicted speech intelligibility scores.
arXiv Detail & Related papers (2022-04-07T09:13:44Z) - Towards End-to-end Unsupervised Speech Recognition [120.4915001021405]
We introduce wvu which does away with all audio-side pre-processing and improves accuracy through better architecture.
In addition, we introduce an auxiliary self-supervised objective that ties model predictions back to the input.
Experiments show that wvuimproves unsupervised recognition results across different languages while being conceptually simpler.
arXiv Detail & Related papers (2022-04-05T21:22:38Z) - HASA-net: A non-intrusive hearing-aid speech assessment network [52.83357278948373]
We propose a DNN-based hearing aid speech assessment network (HASA-Net) to predict speech quality and intelligibility scores simultaneously.
To the best of our knowledge, HASA-Net is the first work to incorporate quality and intelligibility assessments utilizing a unified DNN-based non-intrusive model for hearing aids.
Experimental results show that the predicted speech quality and intelligibility scores of HASA-Net are highly correlated to two well-known intrusive hearing-aid evaluation metrics.
arXiv Detail & Related papers (2021-11-10T14:10:13Z) - InQSS: a speech intelligibility assessment model using a multi-task
learning network [21.037410575414995]
In this study, we propose InQSS, a speech intelligibility assessment model that uses both spectrogram and scattering coefficients as input features.
The resulting model can predict not only the intelligibility scores but also the quality scores of a speech.
arXiv Detail & Related papers (2021-11-04T02:01:27Z) - An Exploration of Self-Supervised Pretrained Representations for
End-to-End Speech Recognition [98.70304981174748]
We focus on the general applications of pretrained speech representations, on advanced end-to-end automatic speech recognition (E2E-ASR) models.
We select several pretrained speech representations and present the experimental results on various open-source and publicly available corpora for E2E-ASR.
arXiv Detail & Related papers (2021-10-09T15:06:09Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.