A Multi-modal Approach to Dysarthria Detection and Severity Assessment Using Speech and Text Information
- URL: http://arxiv.org/abs/2412.16874v1
- Date: Sun, 22 Dec 2024 06:08:35 GMT
- Title: A Multi-modal Approach to Dysarthria Detection and Severity Assessment Using Speech and Text Information
- Authors: Anuprabha M, Krishna Gurugubelli, Kesavaraj V, Anil Kumar Vuppala,
- Abstract summary: This study introduces a novel approach that leverages both speech and text modalities.
By employing cross-attention mechanism, our method learns the acoustic and linguistic similarities between speech and text representations.
- Score: 9.172160338245252
- License:
- Abstract: Automatic detection and severity assessment of dysarthria are crucial for delivering targeted therapeutic interventions to patients. While most existing research focuses primarily on speech modality, this study introduces a novel approach that leverages both speech and text modalities. By employing cross-attention mechanism, our method learns the acoustic and linguistic similarities between speech and text representations. This approach assesses specifically the pronunciation deviations across different severity levels, thereby enhancing the accuracy of dysarthric detection and severity assessment. All the experiments have been performed using UA-Speech dysarthric database. Improved accuracies of 99.53% and 93.20% in detection, and 98.12% and 51.97% for severity assessment have been achieved when speaker-dependent and speaker-independent, unseen and seen words settings are used. These findings suggest that by integrating text information, which provides a reference linguistic knowledge, a more robust framework has been developed for dysarthric detection and assessment, thereby potentially leading to more effective diagnoses.
Related papers
- Dialogue is Better Than Monologue: Instructing Medical LLMs via Strategical Conversations [74.83732294523402]
We introduce a novel benchmark that simulates real-world diagnostic scenarios, integrating noise and difficulty levels aligned with USMLE standards.
We also explore dialogue-based fine-tuning, which transforms static datasets into conversational formats to better capture iterative reasoning processes.
Experiments show that dialogue-tuned models outperform traditional methods, with improvements of $9.64%$ in multi-round reasoning scenarios and $6.18%$ in accuracy in a noisy environment.
arXiv Detail & Related papers (2025-01-29T18:58:48Z) - Robust Cross-Etiology and Speaker-Independent Dysarthric Speech Recognition [26.26414139359157]
We present a speaker-independent dysarthric speech recognition system, with a focus on evaluating the recently released Speech Accessibility Project (SAP-1005) dataset.
Our primary objective is to develop a robust speaker-independent model capable of accurately recognizing dysarthric speech, irrespective of the speaker.
As a secondary objective, we aim to test the cross-etiology performance of our model by evaluating it on the TORGO dataset.
arXiv Detail & Related papers (2025-01-25T00:02:58Z) - Voice Biomarker Analysis and Automated Severity Classification of Dysarthric Speech in a Multilingual Context [1.4721615285883431]
Dysarthria, a motor speech disorder, severely impacts voice quality, pronunciation, and prosody, leading to diminished speech intelligibility and reduced quality of life.
This thesis proposes a novel multilingual dysarthria severity classification method, by analyzing three languages: English, Korean, and Tamil.
arXiv Detail & Related papers (2024-12-01T00:05:00Z) - A Review of Deep Learning Approaches for Non-Invasive Cognitive Impairment Detection [35.31259047578382]
This review paper explores recent advances in deep learning approaches for non-invasive cognitive impairment detection.
We examine various non-invasive indicators of cognitive decline, including speech and language, facial, and motoric mobility.
Despite significant progress, several challenges remain, including data standardization and accessibility, model explainability, longitudinal analysis limitations, and clinical adaptation.
arXiv Detail & Related papers (2024-10-25T17:44:59Z) - Classification of Dysarthria based on the Levels of Severity. A
Systematic Review [1.7624130429860712]
This systematic review aims to analyze current methodologies for classifying dysarthria based on severity levels.
We will systematically review the literature on the automatic classification of dysarthria severity levels.
arXiv Detail & Related papers (2023-10-11T07:40:46Z) - Leveraging Pretrained Representations with Task-related Keywords for
Alzheimer's Disease Detection [69.53626024091076]
Alzheimer's disease (AD) is particularly prominent in older adults.
Recent advances in pre-trained models motivate AD detection modeling to shift from low-level features to high-level representations.
This paper presents several efficient methods to extract better AD-related cues from high-level acoustic and linguistic features.
arXiv Detail & Related papers (2023-03-14T16:03:28Z) - Exploiting prompt learning with pre-trained language models for
Alzheimer's Disease detection [70.86672569101536]
Early diagnosis of Alzheimer's disease (AD) is crucial in facilitating preventive care and to delay further progression.
This paper investigates the use of prompt-based fine-tuning of PLMs that consistently uses AD classification errors as the training objective function.
arXiv Detail & Related papers (2022-10-29T09:18:41Z) - Exploring linguistic feature and model combination for speech
recognition based automatic AD detection [61.91708957996086]
Speech based automatic AD screening systems provide a non-intrusive and more scalable alternative to other clinical screening techniques.
Scarcity of specialist data leads to uncertainty in both model selection and feature learning when developing such systems.
This paper investigates the use of feature and model combination approaches to improve the robustness of domain fine-tuning of BERT and Roberta pre-trained text encoders.
arXiv Detail & Related papers (2022-06-28T05:09:01Z) - Investigation of Data Augmentation Techniques for Disordered Speech
Recognition [69.50670302435174]
This paper investigates a set of data augmentation techniques for disordered speech recognition.
Both normal and disordered speech were exploited in the augmentation process.
The final speaker adapted system constructed using the UASpeech corpus and the best augmentation approach based on speed perturbation produced up to 2.92% absolute word error rate (WER)
arXiv Detail & Related papers (2022-01-14T17:09:22Z) - Assessing clinical utility of Machine Learning and Artificial
Intelligence approaches to analyze speech recordings in Multiple Sclerosis: A
Pilot Study [1.6582693134062305]
The aim of this study was to determine the potential clinical utility of machine learning and deep learning/AI approaches for the aiding of diagnosis, biomarker extraction and progression monitoring of multiple sclerosis using speech recordings.
The Random Forest model performed best, achieving an Accuracy of 0.82 on the validation dataset and an area-under-curve of 0.76 across 5 k-fold cycles on the training dataset.
arXiv Detail & Related papers (2021-09-20T21:02:37Z) - NUVA: A Naming Utterance Verifier for Aphasia Treatment [49.114436579008476]
Assessment of speech performance using picture naming tasks is a key method for both diagnosis and monitoring of responses to treatment interventions by people with aphasia (PWA)
Here we present NUVA, an utterance verification system incorporating a deep learning element that classifies 'correct' versus'incorrect' naming attempts from aphasic stroke patients.
When tested on eight native British-English speaking PWA the system's performance accuracy ranged between 83.6% to 93.6%, with a 10-fold cross-validation mean of 89.5%.
arXiv Detail & Related papers (2021-02-10T13:00:29Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.