Towards objective and interpretable speech disorder assessment: a comparative analysis of CNN and transformer-based models
- URL: http://arxiv.org/abs/2406.07576v1
- Date: Fri, 7 Jun 2024 08:51:52 GMT
- Title: Towards objective and interpretable speech disorder assessment: a comparative analysis of CNN and transformer-based models
- Authors: Malo Maisonneuve, Corinne Fredouille, Muriel Lalain, Alain Ghio, Virginie Woisard,
- Abstract summary: Head and Neck Cancers (HNC) significantly impact patients' ability to speak, affecting their quality of life.
This study proposes a self-supervised Wav2Vec2-based model for phone classification with HNC patients, to enhance accuracy and improve the discrimination of phonetic features for subsequent interpretability purpose.
- Score: 7.774205081900019
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Head and Neck Cancers (HNC) significantly impact patients' ability to speak, affecting their quality of life. Commonly used metrics for assessing pathological speech are subjective, prompting the need for automated and unbiased evaluation methods. This study proposes a self-supervised Wav2Vec2-based model for phone classification with HNC patients, to enhance accuracy and improve the discrimination of phonetic features for subsequent interpretability purpose. The impact of pre-training datasets, model size, and fine-tuning datasets and parameters are explored. Evaluation on diverse corpora reveals the effectiveness of the Wav2Vec2 architecture, outperforming a CNN-based approach, used in previous work. Correlation with perceptual measures also affirms the model relevance for impaired speech analysis. This work paves the way for better understanding of pathological speech with interpretable approaches for clinicians, by leveraging complex self-learnt speech representations.
Related papers
- Exploring ASR-Based Wav2Vec2 for Automated Speech Disorder Assessment: Insights and Analysis [7.567181073057191]
The Wav2Vec2 ASR-based model has been fine-tuned for automated speech disorder quality assessment tasks.
This paper presents the first analysis of this baseline model for speech quality assessment, focusing on intelligibility and severity tasks.
arXiv Detail & Related papers (2024-10-10T13:12:17Z) - Robust and Interpretable Medical Image Classifiers via Concept
Bottleneck Models [49.95603725998561]
We propose a new paradigm to build robust and interpretable medical image classifiers with natural language concepts.
Specifically, we first query clinical concepts from GPT-4, then transform latent image features into explicit concepts with a vision-language model.
arXiv Detail & Related papers (2023-10-04T21:57:09Z) - Automatically measuring speech fluency in people with aphasia: first
achievements using read-speech data [55.84746218227712]
This study aims at assessing the relevance of a signalprocessingalgorithm, initially developed in the field of language acquisition, for the automatic measurement of speech fluency.
arXiv Detail & Related papers (2023-08-09T07:51:40Z) - A study on the impact of Self-Supervised Learning on automatic dysarthric speech assessment [6.284142286798582]
We show that HuBERT is the most versatile feature extractor across dysarthria classification, word recognition, and intelligibility classification, achieving respectively $+24.7%, +61%, textand +7.2%$ accuracy compared to classical acoustic features.
arXiv Detail & Related papers (2023-06-07T11:04:02Z) - Ontology-aware Learning and Evaluation for Audio Tagging [56.59107110017436]
Mean average precision (mAP) metric treats different kinds of sound as independent classes without considering their relations.
Ontology-aware mean average precision (OmAP) addresses the weaknesses of mAP by utilizing the AudioSet ontology information during the evaluation.
We conduct human evaluations and demonstrate that OmAP is more consistent with human perception than mAP.
arXiv Detail & Related papers (2022-11-22T11:35:14Z) - Benchmarking Heterogeneous Treatment Effect Models through the Lens of
Interpretability [82.29775890542967]
Estimating personalized effects of treatments is a complex, yet pervasive problem.
Recent developments in the machine learning literature on heterogeneous treatment effect estimation gave rise to many sophisticated, but opaque, tools.
We use post-hoc feature importance methods to identify features that influence the model's predictions.
arXiv Detail & Related papers (2022-06-16T17:59:05Z) - Self-supervised models of audio effectively explain human cortical
responses to speech [71.57870452667369]
We capitalize on the progress of self-supervised speech representation learning to create new state-of-the-art models of the human auditory system.
We show that these results show that self-supervised models effectively capture the hierarchy of information relevant to different stages of speech processing in human cortex.
arXiv Detail & Related papers (2022-05-27T22:04:02Z) - Speech Detection For Child-Clinician Conversations In Danish For
Low-Resource In-The-Wild Conditions: A Case Study [6.4461798613033405]
We study the performance of a pre-trained speech model on a dataset comprising of child-clinician conversations in Danish.
We learned that the model with default classification threshold performs worse on children from the patient group.
Our study on few-instance adaptation shows that three-minutes of clinician-child conversation is sufficient to obtain the optimum classification threshold.
arXiv Detail & Related papers (2022-04-25T10:51:54Z) - Disentangled Latent Speech Representation for Automatic Pathological
Intelligibility Assessment [10.93598143328628]
We show that disentangled speech representations can be used for automatic pathological speech intelligibility assessment.
Our results are among the first to show that disentangled speech representations can be used for automatic pathological speech intelligibility assessment.
arXiv Detail & Related papers (2022-04-08T12:02:14Z) - Semi-Supervised Variational Reasoning for Medical Dialogue Generation [70.838542865384]
Two key characteristics are relevant for medical dialogue generation: patient states and physician actions.
We propose an end-to-end variational reasoning approach to medical dialogue generation.
A physician policy network composed of an action-classifier and two reasoning detectors is proposed for augmented reasoning ability.
arXiv Detail & Related papers (2021-05-13T04:14:35Z) - Comparison of Speaker Role Recognition and Speaker Enrollment Protocol
for conversational Clinical Interviews [9.728371067160941]
We train end-to-end neural network architectures to adapt to each task and evaluate each approach under the same metric.
Results do not depend on the demographics of the Interviewee, highlighting the clinical relevance of our methods.
arXiv Detail & Related papers (2020-10-30T09:07:37Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.