Disentangled Latent Speech Representation for Automatic Pathological
Intelligibility Assessment
- URL: http://arxiv.org/abs/2204.04016v1
- Date: Fri, 8 Apr 2022 12:02:14 GMT
- Title: Disentangled Latent Speech Representation for Automatic Pathological
Intelligibility Assessment
- Authors: Tobias Weise, Philipp Klumpp, Andreas Maier, Elmar Noeth, Bjoern
Heismann, Maria Schuster, Seung Hee Yang
- Abstract summary: We show that disentangled speech representations can be used for automatic pathological speech intelligibility assessment.
Our results are among the first to show that disentangled speech representations can be used for automatic pathological speech intelligibility assessment.
- Score: 10.93598143328628
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Speech intelligibility assessment plays an important role in the therapy of
patients suffering from pathological speech disorders. Automatic and objective
measures are desirable to assist therapists in their traditionally subjective
and labor-intensive assessments. In this work, we investigate a novel approach
for obtaining such a measure using the divergence in disentangled latent speech
representations of a parallel utterance pair, obtained from a healthy reference
and a pathological speaker. Experiments on an English database of Cerebral
Palsy patients, using all available utterances per speaker, show high and
significant correlation values (R = -0.9) with subjective intelligibility
measures, while having only minimal deviation (+-0.01) across four different
reference speaker pairs. We also demonstrate the robustness of the proposed
method (R = -0.89 deviating +-0.02 over 1000 iterations) by considering a
significantly smaller amount of utterances per speaker. Our results are among
the first to show that disentangled speech representations can be used for
automatic pathological speech intelligibility assessment, resulting in a
reference speaker pair invariant method, applicable in scenarios with only few
utterances available.
Related papers
- Self-supervised Speech Models for Word-Level Stuttered Speech Detection [66.46810024006712]
We introduce a word-level stuttering speech detection model leveraging self-supervised speech models.
Our evaluation demonstrates that our model surpasses previous approaches in word-level stuttering speech detection.
arXiv Detail & Related papers (2024-09-16T20:18:20Z) - Towards objective and interpretable speech disorder assessment: a comparative analysis of CNN and transformer-based models [7.774205081900019]
Head and Neck Cancers (HNC) significantly impact patients' ability to speak, affecting their quality of life.
This study proposes a self-supervised Wav2Vec2-based model for phone classification with HNC patients, to enhance accuracy and improve the discrimination of phonetic features for subsequent interpretability purpose.
arXiv Detail & Related papers (2024-06-07T08:51:52Z) - A Comprehensive Rubric for Annotating Pathological Speech [0.0]
We introduce a comprehensive rubric based on various dimensions of speech quality, including phonetics, fluency, and prosody.
The objective is to establish standardized criteria for identifying errors within the speech of individuals with Down syndrome.
arXiv Detail & Related papers (2024-04-29T16:44:27Z) - Non-Invasive Suicide Risk Prediction Through Speech Analysis [74.8396086718266]
We present a non-invasive, speech-based approach for automatic suicide risk assessment.
We extract three sets of features, including wav2vec, interpretable speech and acoustic features, and deep learning-based spectral representations.
Our most effective speech model achieves a balanced accuracy of $66.2,%$.
arXiv Detail & Related papers (2024-04-18T12:33:57Z) - Automatically measuring speech fluency in people with aphasia: first
achievements using read-speech data [55.84746218227712]
This study aims at assessing the relevance of a signalprocessingalgorithm, initially developed in the field of language acquisition, for the automatic measurement of speech fluency.
arXiv Detail & Related papers (2023-08-09T07:51:40Z) - A Comparative Study on Speaker-attributed Automatic Speech Recognition
in Multi-party Meetings [53.120885867427305]
Three approaches are evaluated for speaker-attributed automatic speech recognition (SA-ASR) in a meeting scenario.
The WD-SOT approach achieves 10.7% relative reduction on averaged speaker-dependent character error rate (SD-CER)
The TS-ASR approach also outperforms the FD-SOT approach and brings 16.5% relative average SD-CER reduction.
arXiv Detail & Related papers (2022-03-31T06:39:14Z) - A Preliminary Study of a Two-Stage Paradigm for Preserving Speaker
Identity in Dysarthric Voice Conversion [50.040466658605524]
We propose a new paradigm for maintaining speaker identity in dysarthric voice conversion (DVC)
The poor quality of dysarthric speech can be greatly improved by statistical VC.
But as the normal speech utterances of a dysarthria patient are nearly impossible to collect, previous work failed to recover the individuality of the patient.
arXiv Detail & Related papers (2021-06-02T18:41:03Z) - Automatic Speaker Independent Dysarthric Speech Intelligibility
Assessment System [28.01689694536572]
Dysarthria is a condition which hampers the ability of an individual to control the muscles that play a major role in speech delivery.
The loss of fine control over muscles that assist the movement of lips, vocal chords, tongue and diaphragm results in abnormal speech delivery.
One can assess the level of dysarthria by analyzing the intelligibility of speech spoken by an individual.
arXiv Detail & Related papers (2021-03-10T16:15:32Z) - Comparison of Speaker Role Recognition and Speaker Enrollment Protocol
for conversational Clinical Interviews [9.728371067160941]
We train end-to-end neural network architectures to adapt to each task and evaluate each approach under the same metric.
Results do not depend on the demographics of the Interviewee, highlighting the clinical relevance of our methods.
arXiv Detail & Related papers (2020-10-30T09:07:37Z) - Continuous Speech Separation with Conformer [60.938212082732775]
We use transformer and conformer in lieu of recurrent neural networks in the separation system.
We believe capturing global information with the self-attention based method is crucial for the speech separation.
arXiv Detail & Related papers (2020-08-13T09:36:05Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.