Non Intrusive Intelligibility Predictor for Hearing Impaired Individuals
using Self Supervised Speech Representations
- URL: http://arxiv.org/abs/2307.13423v3
- Date: Thu, 7 Dec 2023 11:39:58 GMT
- Title: Non Intrusive Intelligibility Predictor for Hearing Impaired Individuals
using Self Supervised Speech Representations
- Authors: George Close, Thomas Hain, Stefan Goetze
- Abstract summary: techniques for non-intrusive prediction of SQ ratings are extended to the prediction of intelligibility for hearing-impaired users.
It is found that self-supervised representations are useful as input features to non-intrusive prediction models.
- Score: 21.237026538221404
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Self-supervised speech representations (SSSRs) have been successfully applied
to a number of speech-processing tasks, e.g. as feature extractor for speech
quality (SQ) prediction, which is, in turn, relevant for assessment and
training speech enhancement systems for users with normal or impaired hearing.
However, exact knowledge of why and how quality-related information is encoded
well in such representations remains poorly understood. In this work,
techniques for non-intrusive prediction of SQ ratings are extended to the
prediction of intelligibility for hearing-impaired users. It is found that
self-supervised representations are useful as input features to non-intrusive
prediction models, achieving competitive performance to more complex systems. A
detailed analysis of the performance depending on Clarity Prediction Challenge
1 listeners and enhancement systems indicates that more data might be needed to
allow generalisation to unknown systems and (hearing-impaired) individuals
Related papers
- Predictive Speech Recognition and End-of-Utterance Detection Towards Spoken Dialog Systems [55.99999020778169]
We study a function that can predict the forthcoming words and estimate the time remaining until the end of an utterance.
We develop a cross-attention-based algorithm that incorporates both acoustic and linguistic information.
Results demonstrate the proposed model's ability to predict upcoming words and estimate future EOU events up to 300ms prior to the actual EOU.
arXiv Detail & Related papers (2024-09-30T06:29:58Z) - STAB: Speech Tokenizer Assessment Benchmark [57.45234921100835]
Representing speech as discrete tokens provides a framework for transforming speech into a format that closely resembles text.
We present STAB (Speech Tokenizer Assessment Benchmark), a systematic evaluation framework designed to assess speech tokenizers comprehensively.
We evaluate the STAB metrics and correlate this with downstream task performance across a range of speech tasks and tokenizer choices.
arXiv Detail & Related papers (2024-09-04T02:20:59Z) - Third-Party Language Model Performance Prediction from Instruction [59.574169249307054]
Language model-based instruction-following systems have lately shown increasing performance on many benchmark tasks.
A user may easily prompt a model with an instruction without any idea of whether the responses should be expected to be accurate.
We propose a third party performance prediction framework, where a separate model is trained to predict the metric resulting from evaluating an instruction-following system on a task.
arXiv Detail & Related papers (2024-03-19T03:53:47Z) - Self-Supervised Speech Quality Estimation and Enhancement Using Only
Clean Speech [50.95292368372455]
We propose VQScore, a self-supervised metric for evaluating speech based on the quantization error of a vector-quantized-variational autoencoder (VQ-VAE)
The training of VQ-VAE relies on clean speech; hence, large quantization errors can be expected when the speech is distorted.
We found that the vector quantization mechanism could also be used for self-supervised speech enhancement (SE) model training.
arXiv Detail & Related papers (2024-02-26T06:01:38Z) - Non-Intrusive Speech Intelligibility Prediction for Hearing-Impaired
Users using Intermediate ASR Features and Human Memory Models [29.511898279006175]
This work combines the use ofWhisper ASR decoder layer representations as neural network input features with an exemplar-based, psychologically motivated model of human memory to predict human intelligibility ratings for hearing-aid users.
Substantial performance improvement over an established intrusive HASPI baseline system is found, including on enhancement systems and listeners unseen in the training data, with a root mean squared error of 25.3 compared with the baseline of 28.7.
arXiv Detail & Related papers (2024-01-24T17:31:07Z) - Perceive and predict: self-supervised speech representation based loss
functions for speech enhancement [23.974815078687445]
It is shown that the distance between the feature encodings of clean and noisy speech correlate strongly with psychoacoustically motivated measures of speech quality and intelligibility.
Experiments using this distance as a loss function are performed and improved performance over the use of STFT spectrogram distance based loss.
arXiv Detail & Related papers (2023-01-11T10:20:56Z) - MBI-Net: A Non-Intrusive Multi-Branched Speech Intelligibility
Prediction Model for Hearing Aids [22.736703635666164]
We propose a multi-branched speech intelligibility prediction model (MBI-Net) for predicting subjective intelligibility scores of hearing aid (HA) users.
The outputs of the two branches are fused through a linear layer to obtain predicted speech intelligibility scores.
arXiv Detail & Related papers (2022-04-07T09:13:44Z) - Towards End-to-end Unsupervised Speech Recognition [120.4915001021405]
We introduce wvu which does away with all audio-side pre-processing and improves accuracy through better architecture.
In addition, we introduce an auxiliary self-supervised objective that ties model predictions back to the input.
Experiments show that wvuimproves unsupervised recognition results across different languages while being conceptually simpler.
arXiv Detail & Related papers (2022-04-05T21:22:38Z) - HASA-net: A non-intrusive hearing-aid speech assessment network [52.83357278948373]
We propose a DNN-based hearing aid speech assessment network (HASA-Net) to predict speech quality and intelligibility scores simultaneously.
To the best of our knowledge, HASA-Net is the first work to incorporate quality and intelligibility assessments utilizing a unified DNN-based non-intrusive model for hearing aids.
Experimental results show that the predicted speech quality and intelligibility scores of HASA-Net are highly correlated to two well-known intrusive hearing-aid evaluation metrics.
arXiv Detail & Related papers (2021-11-10T14:10:13Z) - InQSS: a speech intelligibility assessment model using a multi-task
learning network [21.037410575414995]
In this study, we propose InQSS, a speech intelligibility assessment model that uses both spectrogram and scattering coefficients as input features.
The resulting model can predict not only the intelligibility scores but also the quality scores of a speech.
arXiv Detail & Related papers (2021-11-04T02:01:27Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.