Automatic Severity Classification of Dysarthric speech by using
Self-supervised Model with Multi-task Learning
- URL: http://arxiv.org/abs/2210.15387v3
- Date: Fri, 28 Apr 2023 16:41:16 GMT
- Title: Automatic Severity Classification of Dysarthric speech by using
Self-supervised Model with Multi-task Learning
- Authors: Eun Jung Yeo, Kwanghee Choi, Sunhee Kim, Minhwa Chung
- Abstract summary: We propose a novel automatic severity assessment method for dysarthric speech using the self-supervised model in conjunction with multi-task learning.
Wav2vec 2.0 XLS-R is trained for two different tasks: severity classification and auxiliary automatic speech recognition (ASR)
Our model outperforms the traditional baseline methods, with a relative percentage increase of 1.25% for F1-score.
- Score: 4.947423926765435
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Automatic assessment of dysarthric speech is essential for sustained
treatments and rehabilitation. However, obtaining atypical speech is
challenging, often leading to data scarcity issues. To tackle the problem, we
propose a novel automatic severity assessment method for dysarthric speech,
using the self-supervised model in conjunction with multi-task learning.
Wav2vec 2.0 XLS-R is jointly trained for two different tasks: severity
classification and auxiliary automatic speech recognition (ASR). For the
baseline experiments, we employ hand-crafted acoustic features and machine
learning classifiers such as SVM, MLP, and XGBoost. Explored on the Korean
dysarthric speech QoLT database, our model outperforms the traditional baseline
methods, with a relative percentage increase of 1.25% for F1-score. In
addition, the proposed model surpasses the model trained without ASR head,
achieving 10.61% relative percentage improvements. Furthermore, we present how
multi-task learning affects the severity classification performance by
analyzing the latent representations and regularization effect.
Related papers
- Towards objective and interpretable speech disorder assessment: a comparative analysis of CNN and transformer-based models [7.774205081900019]
Head and Neck Cancers (HNC) significantly impact patients' ability to speak, affecting their quality of life.
This study proposes a self-supervised Wav2Vec2-based model for phone classification with HNC patients, to enhance accuracy and improve the discrimination of phonetic features for subsequent interpretability purpose.
arXiv Detail & Related papers (2024-06-07T08:51:52Z) - Speaker-Independent Dysarthria Severity Classification using
Self-Supervised Transformers and Multi-Task Learning [2.7706924578324665]
This study presents a transformer-based framework for automatically assessing dysarthria severity from raw speech data.
We develop a framework, called Speaker-Agnostic Latent Regularisation (SALR), incorporating a multi-task learning objective and contrastive learning for speaker-independent multi-class dysarthria severity classification.
Our model demonstrated superior performance over traditional machine learning approaches, with an accuracy of $70.48%$ and an F1 score of $59.23%$.
arXiv Detail & Related papers (2024-02-29T18:30:52Z) - A Few-Shot Approach to Dysarthric Speech Intelligibility Level
Classification Using Transformers [0.0]
Dysarthria is a speech disorder that hinders communication due to difficulties in articulating words.
Much of the literature focused on improving ASR systems for dysarthric speech.
This work aims to develop models that can accurately classify the presence of dysarthria.
arXiv Detail & Related papers (2023-09-17T17:23:41Z) - A study on the impact of Self-Supervised Learning on automatic dysarthric speech assessment [6.284142286798582]
We show that HuBERT is the most versatile feature extractor across dysarthria classification, word recognition, and intelligibility classification, achieving respectively $+24.7%, +61%, textand +7.2%$ accuracy compared to classical acoustic features.
arXiv Detail & Related papers (2023-06-07T11:04:02Z) - Leveraging Pretrained Representations with Task-related Keywords for
Alzheimer's Disease Detection [69.53626024091076]
Alzheimer's disease (AD) is particularly prominent in older adults.
Recent advances in pre-trained models motivate AD detection modeling to shift from low-level features to high-level representations.
This paper presents several efficient methods to extract better AD-related cues from high-level acoustic and linguistic features.
arXiv Detail & Related papers (2023-03-14T16:03:28Z) - Automated Fidelity Assessment for Strategy Training in Inpatient
Rehabilitation using Natural Language Processing [53.096237570992294]
Strategy training is a rehabilitation approach that teaches skills to reduce disability among those with cognitive impairments following a stroke.
Standardized fidelity assessment is used to measure adherence to treatment principles.
We developed a rule-based NLP algorithm, a long-short term memory (LSTM) model, and a bidirectional encoder representation from transformers (BERT) model for this task.
arXiv Detail & Related papers (2022-09-14T15:33:30Z) - Performance or Trust? Why Not Both. Deep AUC Maximization with
Self-Supervised Learning for COVID-19 Chest X-ray Classifications [72.52228843498193]
In training deep learning models, a compromise often must be made between performance and trust.
In this work, we integrate a new surrogate loss with self-supervised learning for computer-aided screening of COVID-19 patients.
arXiv Detail & Related papers (2021-12-14T21:16:52Z) - Sequence-level self-learning with multiple hypotheses [53.04725240411895]
We develop new self-learning techniques with an attention-based sequence-to-sequence (seq2seq) model for automatic speech recognition (ASR)
In contrast to conventional unsupervised learning approaches, we adopt the emphmulti-task learning (MTL) framework.
Our experiment results show that our method can reduce the WER on the British speech data from 14.55% to 10.36% compared to the baseline model trained with the US English data only.
arXiv Detail & Related papers (2021-12-10T20:47:58Z) - NUVA: A Naming Utterance Verifier for Aphasia Treatment [49.114436579008476]
Assessment of speech performance using picture naming tasks is a key method for both diagnosis and monitoring of responses to treatment interventions by people with aphasia (PWA)
Here we present NUVA, an utterance verification system incorporating a deep learning element that classifies 'correct' versus'incorrect' naming attempts from aphasic stroke patients.
When tested on eight native British-English speaking PWA the system's performance accuracy ranged between 83.6% to 93.6%, with a 10-fold cross-validation mean of 89.5%.
arXiv Detail & Related papers (2021-02-10T13:00:29Z) - Incremental Learning for End-to-End Automatic Speech Recognition [41.297106772785206]
We propose an incremental learning method for end-to-end Automatic Speech Recognition (ASR)
We design a novel explainability-based knowledge distillation for ASR models, which is combined with a response-based knowledge distillation to maintain the original model's predictions and the "reason" for the predictions.
Results on a multi-stage sequential training task show that our method outperforms existing ones in mitigating forgetting.
arXiv Detail & Related papers (2020-05-11T08:18:08Z) - Self-Training with Improved Regularization for Sample-Efficient Chest
X-Ray Classification [80.00316465793702]
We present a deep learning framework that enables robust modeling in challenging scenarios.
Our results show that using 85% lesser labeled data, we can build predictive models that match the performance of classifiers trained in a large-scale data setting.
arXiv Detail & Related papers (2020-05-03T02:36:00Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.