Related papers: A study on the impact of Self-Supervised Learning on automatic dysarthric speech assessment

A study on the impact of Self-Supervised Learning on automatic dysarthric speech assessment

URL: http://arxiv.org/abs/2306.04337v2
Date: Fri, 22 Mar 2024 18:41:02 GMT
Title: A study on the impact of Self-Supervised Learning on automatic dysarthric speech assessment
Authors: Xavier F. Cadet, Ranya Aloufi, Sara Ahmadi-Abhari, Hamed Haddadi,
Abstract summary: We show that HuBERT is the most versatile feature extractor across dysarthria classification, word recognition, and intelligibility classification, achieving respectively $+24.7%, +61%, textand +7.2%$ accuracy compared to classical acoustic features.
Score: 6.284142286798582
License: http://creativecommons.org/licenses/by/4.0/
Abstract: Automating dysarthria assessments offers the opportunity to develop practical, low-cost tools that address the current limitations of manual and subjective assessments. Nonetheless, the small size of most dysarthria datasets makes it challenging to develop automated assessment. Recent research showed that speech representations from models pre-trained on large unlabelled data can enhance Automatic Speech Recognition (ASR) performance for dysarthric speech. We are the first to evaluate the representations from pre-trained state-of-the-art Self-Supervised models across three downstream tasks on dysarthric speech: disease classification, word recognition and intelligibility classification, and under three noise scenarios on the UA-Speech dataset. We show that HuBERT is the most versatile feature extractor across dysarthria classification, word recognition, and intelligibility classification, achieving respectively $+24.7\%, +61\%, \text{and} +7.2\%$ accuracy compared to classical acoustic features.

Related papers

Towards objective and interpretable speech disorder assessment: a comparative analysis of CNN and transformer-based models [7.774205081900019]
Head and Neck Cancers (HNC) significantly impact patients' ability to speak, affecting their quality of life. This study proposes a self-supervised Wav2Vec2-based model for phone classification with HNC patients, to enhance accuracy and improve the discrimination of phonetic features for subsequent interpretability purpose.
arXiv Detail & Related papers (2024-06-07T08:51:52Z)
Speaker-Independent Dysarthria Severity Classification using Self-Supervised Transformers and Multi-Task Learning [2.7706924578324665]
This study presents a transformer-based framework for automatically assessing dysarthria severity from raw speech data. We develop a framework, called Speaker-Agnostic Latent Regularisation (SALR), incorporating a multi-task learning objective and contrastive learning for speaker-independent multi-class dysarthria severity classification. Our model demonstrated superior performance over traditional machine learning approaches, with an accuracy of $70.48%$ and an F1 score of $59.23%$.
arXiv Detail & Related papers (2024-02-29T18:30:52Z)
Analysing the Impact of Audio Quality on the Use of Naturalistic Long-Form Recordings for Infant-Directed Speech Research [62.997667081978825]
Modelling of early language acquisition aims to understand how infants bootstrap their language skills. Recent developments have enabled the use of more naturalistic training data for computational models. It is currently unclear how the sound quality could affect analyses and modelling experiments conducted on such data.
arXiv Detail & Related papers (2023-05-03T08:25:37Z)
Leveraging Pretrained Representations with Task-related Keywords for Alzheimer's Disease Detection [69.53626024091076]
Alzheimer's disease (AD) is particularly prominent in older adults. Recent advances in pre-trained models motivate AD detection modeling to shift from low-level features to high-level representations. This paper presents several efficient methods to extract better AD-related cues from high-level acoustic and linguistic features.
arXiv Detail & Related papers (2023-03-14T16:03:28Z)
Automatic Severity Classification of Dysarthric speech by using Self-supervised Model with Multi-task Learning [4.947423926765435]
We propose a novel automatic severity assessment method for dysarthric speech using the self-supervised model in conjunction with multi-task learning. Wav2vec 2.0 XLS-R is trained for two different tasks: severity classification and auxiliary automatic speech recognition (ASR) Our model outperforms the traditional baseline methods, with a relative percentage increase of 1.25% for F1-score.
arXiv Detail & Related papers (2022-10-27T12:48:10Z)
Exploring linguistic feature and model combination for speech recognition based automatic AD detection [61.91708957996086]
Speech based automatic AD screening systems provide a non-intrusive and more scalable alternative to other clinical screening techniques. Scarcity of specialist data leads to uncertainty in both model selection and feature learning when developing such systems. This paper investigates the use of feature and model combination approaches to improve the robustness of domain fine-tuning of BERT and Roberta pre-trained text encoders.
arXiv Detail & Related papers (2022-06-28T05:09:01Z)
Self-supervised models of audio effectively explain human cortical responses to speech [71.57870452667369]
We capitalize on the progress of self-supervised speech representation learning to create new state-of-the-art models of the human auditory system. We show that these results show that self-supervised models effectively capture the hierarchy of information relevant to different stages of speech processing in human cortex.
arXiv Detail & Related papers (2022-05-27T22:04:02Z)
Speech Detection For Child-Clinician Conversations In Danish For Low-Resource In-The-Wild Conditions: A Case Study [6.4461798613033405]
We study the performance of a pre-trained speech model on a dataset comprising of child-clinician conversations in Danish. We learned that the model with default classification threshold performs worse on children from the patient group. Our study on few-instance adaptation shows that three-minutes of clinician-child conversation is sufficient to obtain the optimum classification threshold.
arXiv Detail & Related papers (2022-04-25T10:51:54Z)
An Exploration of Self-Supervised Pretrained Representations for End-to-End Speech Recognition [98.70304981174748]
We focus on the general applications of pretrained speech representations, on advanced end-to-end automatic speech recognition (E2E-ASR) models. We select several pretrained speech representations and present the experimental results on various open-source and publicly available corpora for E2E-ASR.
arXiv Detail & Related papers (2021-10-09T15:06:09Z)
Distantly-Supervised Named Entity Recognition with Noise-Robust Learning and Language Model Augmented Self-Training [66.80558875393565]
We study the problem of training named entity recognition (NER) models using only distantly-labeled data. We propose a noise-robust learning scheme comprised of a new loss function and a noisy label removal step. Our method achieves superior performance, outperforming existing distantly-supervised NER models by significant margins.
arXiv Detail & Related papers (2021-09-10T17:19:56Z)

This list is automatically generated from the titles and abstracts of the papers in this site.