Cross-Lingual Multi-Granularity Framework for Interpretable Parkinson's Disease Diagnosis from Speech
- URL: http://arxiv.org/abs/2510.03758v1
- Date: Sat, 04 Oct 2025 09:51:00 GMT
- Title: Cross-Lingual Multi-Granularity Framework for Interpretable Parkinson's Disease Diagnosis from Speech
- Authors: Ilias Tougui, Mehdi Zakroum, Mounir Ghogho,
- Abstract summary: Parkinson's Disease (PD) affects over 10 million people worldwide, with speech impairments in up to 89% of patients.<n>We developed a granularity-aware approach for multilingual PD detection using an automated pipeline that extracts time-aligned phonemes, syllables, and words from recordings.<n>Phoneme-level analysis achieved superior performance with AUROC of 93.78% +- 2.34% and accuracy of 92.17% +- 2.43%.
- Score: 12.214351085553822
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Parkinson's Disease (PD) affects over 10 million people worldwide, with speech impairments in up to 89% of patients. Current speech-based detection systems analyze entire utterances, potentially overlooking the diagnostic value of specific phonetic elements. We developed a granularity-aware approach for multilingual PD detection using an automated pipeline that extracts time-aligned phonemes, syllables, and words from recordings. Using Italian, Spanish, and English datasets, we implemented a bidirectional LSTM with multi-head attention to compare diagnostic performance across the different granularity levels. Phoneme-level analysis achieved superior performance with AUROC of 93.78% +- 2.34% and accuracy of 92.17% +- 2.43%. This demonstrates enhanced diagnostic capability for cross-linguistic PD detection. Importantly, attention analysis revealed that the most informative speech features align with those used in established clinical protocols: sustained vowels (/a/, /e/, /o/, /i/) at phoneme level, diadochokinetic syllables (/ta/, /pa/, /la/, /ka/) at syllable level, and /pataka/ sequences at word level. Source code will be available at https://github.com/jetliqs/clearpd.
Related papers
- Speech Discrete Tokens or Continuous Features? A Comparative Analysis for Spoken Language Understanding in SpeechLLMs [59.230858581944425]
Two dominant approaches have emerged for speech processing: discrete tokens and continuous features.<n>We compare self-supervised learning (SSL)-based discrete and continuous features under the same experimental settings.<n>Our findings reveal that continuous features generally outperform discrete tokens in various tasks.
arXiv Detail & Related papers (2025-08-25T10:16:07Z) - Does Language Matter for Early Detection of Parkinson's Disease from Speech? [9.968776083852813]
Using speech samples as a biomarker is a promising avenue for detecting and monitoring the progression of Parkinson's disease (PD)<n>To assess the role of language in PD detection, we tested pretrained models with varying data types and pretraining objectives.
arXiv Detail & Related papers (2025-07-14T19:23:09Z) - Evaluating the Effectiveness of Pre-Trained Audio Embeddings for Classification of Parkinson's Disease Speech Data [0.7673339435080445]
Speech impairments are prevalent biomarkers for Parkinson's Disease (PD)<n>Deep acoustic features have shown promise for PD classification, but their effectiveness often varies due to speaker differences.<n>This study investigates the effectiveness of three pre-trained audio embeddings for PD classification.
arXiv Detail & Related papers (2025-06-02T09:32:54Z) - NeuroVoz: a Castillian Spanish corpus of parkinsonian speech [34.916222066004465]
This manuscript presents the NeuroVoz corpus consisting of 112 native Castilian-Spanish speakers, including 58 healthy controls and 54 individuals with PD.<n>The dataset is also complemented with subjective assessments of voice quality performed by an expert according to the GRBAS scale.<n>This data set has already supported several studies, achieving a benchmark accuracy of 89% for the screening of PD.
arXiv Detail & Related papers (2024-03-04T16:17:39Z) - Towards a Deep Understanding of Multilingual End-to-End Speech
Translation [52.26739715012842]
We analyze representations learnt in a multilingual end-to-end speech translation model trained over 22 languages.
We derive three major findings from our analysis.
arXiv Detail & Related papers (2023-10-31T13:50:55Z) - Automatically measuring speech fluency in people with aphasia: first
achievements using read-speech data [55.84746218227712]
This study aims at assessing the relevance of a signalprocessingalgorithm, initially developed in the field of language acquisition, for the automatic measurement of speech fluency.
arXiv Detail & Related papers (2023-08-09T07:51:40Z) - Exploiting Cross-domain And Cross-Lingual Ultrasound Tongue Imaging
Features For Elderly And Dysarthric Speech Recognition [55.25565305101314]
Articulatory features are invariant to acoustic signal distortion and have been successfully incorporated into automatic speech recognition systems.
This paper presents a cross-domain and cross-lingual A2A inversion approach that utilizes the parallel audio and ultrasound tongue imaging (UTI) data of the 24-hour TaL corpus in A2A model pre-training.
Experiments conducted on three tasks suggested incorporating the generated articulatory features consistently outperformed the baseline TDNN and Conformer ASR systems.
arXiv Detail & Related papers (2022-06-15T07:20:28Z) - Parkinson's disease diagnostics using AI and natural language knowledge
transfer [0.0]
Deep learning approach for classification of raw speech recordings in patients with diagnosed PD was proposed.
Method was tested on a group of 38 PD patients and 10 healthy persons above the age of 50.
arXiv Detail & Related papers (2022-04-26T19:39:29Z) - Few-Shot Cross-lingual Transfer for Coarse-grained De-identification of
Code-Mixed Clinical Texts [56.72488923420374]
Pre-trained language models (LMs) have shown great potential for cross-lingual transfer in low-resource settings.
We show the few-shot cross-lingual transfer property of LMs for named recognition (NER) and apply it to solve a low-resource and real-world challenge of code-mixed (Spanish-Catalan) clinical notes de-identification in the stroke.
arXiv Detail & Related papers (2022-04-10T21:46:52Z) - Comparative Study of Speech Analysis Methods to Predict Parkinson's
Disease [0.0]
Speech disorders can be used to detect this disease before it degenerates.
This work analyzes speech features and machine learning approaches to predict PD.
Using all the acoustic features and MFCC, together with SVM produced the highest performance with an accuracy of 98%.
arXiv Detail & Related papers (2021-11-15T04:29:51Z) - NUVA: A Naming Utterance Verifier for Aphasia Treatment [49.114436579008476]
Assessment of speech performance using picture naming tasks is a key method for both diagnosis and monitoring of responses to treatment interventions by people with aphasia (PWA)
Here we present NUVA, an utterance verification system incorporating a deep learning element that classifies 'correct' versus'incorrect' naming attempts from aphasic stroke patients.
When tested on eight native British-English speaking PWA the system's performance accuracy ranged between 83.6% to 93.6%, with a 10-fold cross-validation mean of 89.5%.
arXiv Detail & Related papers (2021-02-10T13:00:29Z) - Detecting Parkinson's Disease From an Online Speech-task [4.968576908394359]
In this paper, we envision a web-based framework that can help anyone, anywhere around the world record a short speech task, and analyze the recorded data to screen for Parkinson's disease (PD)
We collected data from 726 unique participants (262 PD, 38% female; 464 non-PD, 65% female; average age: 61) from all over the US and beyond.
We extracted both standard acoustic features (MFCC), jitter and shimmer variants, and deep learning based features from the speech data.
Our model performed equally well on data collected in controlled lab environment as well as 'in the wild'
arXiv Detail & Related papers (2020-09-02T21:16:24Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.