Voice Biomarker Analysis and Automated Severity Classification of Dysarthric Speech in a Multilingual Context
- URL: http://arxiv.org/abs/2412.12111v1
- Date: Sun, 01 Dec 2024 00:05:00 GMT
- Title: Voice Biomarker Analysis and Automated Severity Classification of Dysarthric Speech in a Multilingual Context
- Authors: Eunjung Yeo,
- Abstract summary: Dysarthria, a motor speech disorder, severely impacts voice quality, pronunciation, and prosody, leading to diminished speech intelligibility and reduced quality of life.
This thesis proposes a novel multilingual dysarthria severity classification method, by analyzing three languages: English, Korean, and Tamil.
- Score: 1.4721615285883431
- License:
- Abstract: Dysarthria, a motor speech disorder, severely impacts voice quality, pronunciation, and prosody, leading to diminished speech intelligibility and reduced quality of life. Accurate assessment is crucial for effective treatment, but traditional perceptual assessments are limited by their subjectivity and resource intensity. To mitigate the limitations, automatic dysarthric speech assessment methods have been proposed to support clinicians on their decision-making. While these methods have shown promising results, most research has focused on monolingual environments. However, multilingual approaches are necessary to address the global burden of dysarthria and ensure equitable access to accurate diagnosis. This thesis proposes a novel multilingual dysarthria severity classification method, by analyzing three languages: English, Korean, and Tamil.
Related papers
- A Multi-modal Approach to Dysarthria Detection and Severity Assessment Using Speech and Text Information [9.172160338245252]
This study introduces a novel approach that leverages both speech and text modalities.
By employing cross-attention mechanism, our method learns the acoustic and linguistic similarities between speech and text representations.
arXiv Detail & Related papers (2024-12-22T06:08:35Z) - Self-supervised Speech Models for Word-Level Stuttered Speech Detection [66.46810024006712]
We introduce a word-level stuttering speech detection model leveraging self-supervised speech models.
Our evaluation demonstrates that our model surpasses previous approaches in word-level stuttering speech detection.
arXiv Detail & Related papers (2024-09-16T20:18:20Z) - Perceiver-Prompt: Flexible Speaker Adaptation in Whisper for Chinese Disordered Speech Recognition [40.44769351506048]
Perceiver-Prompt is a method for speaker adaptation that utilizes P-Tuning on the Whisper large-scale model.
We first fine-tune Whisper using LoRA and then integrate a trainable Perceiver to generate fixed-length speaker prompts from variable-length inputs.
arXiv Detail & Related papers (2024-06-14T09:36:46Z) - Identification of Cognitive Decline from Spoken Language through Feature
Selection and the Bag of Acoustic Words Model [0.0]
The early identification of symptoms of memory disorders plays a significant role in ensuring the well-being of populations.
The lack of standardized speech tests in clinical settings has led to a growing emphasis on developing automatic machine learning techniques for analyzing naturally spoken language.
The work presents an approach related to feature selection, allowing for the automatic selection of the essential features required for diagnosis from the Geneva minimalistic acoustic parameter set and relative speech pauses.
arXiv Detail & Related papers (2024-02-02T17:06:03Z) - Quantifying the Dialect Gap and its Correlates Across Languages [69.18461982439031]
This work will lay the foundation for furthering the field of dialectal NLP by laying out evident disparities and identifying possible pathways for addressing them through mindful data collection.
arXiv Detail & Related papers (2023-10-23T17:42:01Z) - Spectro-Temporal Deep Features for Disordered Speech Assessment and
Recognition [65.25325641528701]
Motivated by the spectro-temporal level differences between disordered and normal speech that systematically manifest in articulatory imprecision, decreased volume and clarity, slower speaking rates and increased dysfluencies, novel spectro-temporal subspace basis embedding deep features derived by SVD decomposition of speech spectrum are proposed.
Experiments conducted on the UASpeech corpus suggest the proposed spectro-temporal deep feature adapted systems consistently outperformed baseline i- adaptation by up to 263% absolute (8.6% relative) reduction in word error rate (WER) with or without data augmentation.
arXiv Detail & Related papers (2022-01-14T16:56:43Z) - Cross-lingual Transfer for Speech Processing using Acoustic Language
Similarity [81.51206991542242]
Cross-lingual transfer offers a compelling way to help bridge this digital divide.
Current cross-lingual algorithms have shown success in text-based tasks and speech-related tasks over some low-resource languages.
We propose a language similarity approach that can efficiently identify acoustic cross-lingual transfer pairs across hundreds of languages.
arXiv Detail & Related papers (2021-11-02T01:55:17Z) - AM2iCo: Evaluating Word Meaning in Context across Low-ResourceLanguages
with Adversarial Examples [51.048234591165155]
We present AM2iCo, Adversarial and Multilingual Meaning in Context.
It aims to faithfully assess the ability of state-of-the-art (SotA) representation models to understand the identity of word meaning in cross-lingual contexts.
Results reveal that current SotA pretrained encoders substantially lag behind human performance.
arXiv Detail & Related papers (2021-04-17T20:23:45Z) - Automatic Speaker Independent Dysarthric Speech Intelligibility
Assessment System [28.01689694536572]
Dysarthria is a condition which hampers the ability of an individual to control the muscles that play a major role in speech delivery.
The loss of fine control over muscles that assist the movement of lips, vocal chords, tongue and diaphragm results in abnormal speech delivery.
One can assess the level of dysarthria by analyzing the intelligibility of speech spoken by an individual.
arXiv Detail & Related papers (2021-03-10T16:15:32Z) - NUVA: A Naming Utterance Verifier for Aphasia Treatment [49.114436579008476]
Assessment of speech performance using picture naming tasks is a key method for both diagnosis and monitoring of responses to treatment interventions by people with aphasia (PWA)
Here we present NUVA, an utterance verification system incorporating a deep learning element that classifies 'correct' versus'incorrect' naming attempts from aphasic stroke patients.
When tested on eight native British-English speaking PWA the system's performance accuracy ranged between 83.6% to 93.6%, with a 10-fold cross-validation mean of 89.5%.
arXiv Detail & Related papers (2021-02-10T13:00:29Z) - Comparison of Speaker Role Recognition and Speaker Enrollment Protocol
for conversational Clinical Interviews [9.728371067160941]
We train end-to-end neural network architectures to adapt to each task and evaluate each approach under the same metric.
Results do not depend on the demographics of the Interviewee, highlighting the clinical relevance of our methods.
arXiv Detail & Related papers (2020-10-30T09:07:37Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.