Evaluating the Effectiveness of Pre-Trained Audio Embeddings for Classification of Parkinson's Disease Speech Data
- URL: http://arxiv.org/abs/2506.02078v1
- Date: Mon, 02 Jun 2025 09:32:54 GMT
- Title: Evaluating the Effectiveness of Pre-Trained Audio Embeddings for Classification of Parkinson's Disease Speech Data
- Authors: Emmy Postma, Cristian Tejedor-Garcia,
- Abstract summary: Speech impairments are prevalent biomarkers for Parkinson's Disease (PD)<n>Deep acoustic features have shown promise for PD classification, but their effectiveness often varies due to speaker differences.<n>This study investigates the effectiveness of three pre-trained audio embeddings for PD classification.
- Score: 0.7673339435080445
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Speech impairments are prevalent biomarkers for Parkinson's Disease (PD), motivating the development of diagnostic techniques using speech data for clinical applications. Although deep acoustic features have shown promise for PD classification, their effectiveness often varies due to individual speaker differences, a factor that has not been thoroughly explored in the existing literature. This study investigates the effectiveness of three pre-trained audio embeddings (OpenL3, VGGish and Wav2Vec2.0 models) for PD classification. Using the NeuroVoz dataset, OpenL3 outperforms others in diadochokinesis (DDK) and listen and repeat (LR) tasks, capturing critical acoustic features for PD detection. Only Wav2Vec2.0 shows significant gender bias, achieving more favorable results for male speakers, in DDK tasks. The misclassified cases reveal challenges with atypical speech patterns, highlighting the need for improved feature extraction and model robustness in PD detection.
Related papers
- Does Language Matter for Early Detection of Parkinson's Disease from Speech? [9.968776083852813]
Using speech samples as a biomarker is a promising avenue for detecting and monitoring the progression of Parkinson's disease (PD)<n>To assess the role of language in PD detection, we tested pretrained models with varying data types and pretraining objectives.
arXiv Detail & Related papers (2025-07-14T19:23:09Z) - Evaluating the Usefulness of Non-Diagnostic Speech Data for Developing Parkinson's Disease Classifiers [5.7624965034085545]
Speech-based Parkinson's disease (PD) detection has gained attention for its automated, cost-effective, and non-intrusive nature.<n>This work explores the feasibility of diagnosing PD on the basis of speech data not originally intended for diagnostic purposes, using the Turn-Taking dataset.
arXiv Detail & Related papers (2025-05-24T14:45:55Z) - Distinguishing Parkinson's Patients Using Voice-Based Feature Extraction and Classification [0.0]
This study focuses on differentiating individuals with Parkinson's disease from healthy controls through the extraction and classification of speech features.<n>The accuracy of our 3-layer artificial neural network architecture was also compared with classical machine learning algorithms.
arXiv Detail & Related papers (2025-01-24T10:44:16Z) - Investigating the Effectiveness of Explainability Methods in Parkinson's Detection from Speech [13.700867213652648]
Speech impairments in Parkinson's disease (PD) provide significant early indicators for diagnosis.
Models for speech-based PD detection have shown strong performance, but their interpretability remains underexplored.
This study systematically evaluates several explainability methods to identify PD-specific speech features.
arXiv Detail & Related papers (2024-11-12T18:43:27Z) - Early Recognition of Parkinson's Disease Through Acoustic Analysis and Machine Learning [0.0]
Parkinson's Disease (PD) is a progressive neurodegenerative disorder that significantly impacts both motor and non-motor functions, including speech.
This paper provides a comprehensive review of methods for PD recognition using speech data, highlighting advances in machine learning and data-driven approaches.
Various classification algorithms are explored, including logistic regression, SVM, and neural networks, with and without feature selection.
Our findings indicate that specific acoustic features and advanced machine-learning techniques can effectively differentiate between individuals with PD and healthy controls.
arXiv Detail & Related papers (2024-07-22T23:24:02Z) - What to Remember: Self-Adaptive Continual Learning for Audio Deepfake
Detection [53.063161380423715]
Existing detection models have shown remarkable success in discriminating known deepfake audio, but struggle when encountering new attack types.
We propose a continual learning approach called Radian Weight Modification (RWM) for audio deepfake detection.
arXiv Detail & Related papers (2023-12-15T09:52:17Z) - Deep Feature Learning for Medical Acoustics [78.56998585396421]
The purpose of this paper is to compare different learnables in medical acoustics tasks.
A framework has been implemented to classify human respiratory sounds and heartbeats in two categories, i.e. healthy or affected by pathologies.
arXiv Detail & Related papers (2022-08-05T10:39:37Z) - Exploiting Cross-domain And Cross-Lingual Ultrasound Tongue Imaging
Features For Elderly And Dysarthric Speech Recognition [55.25565305101314]
Articulatory features are invariant to acoustic signal distortion and have been successfully incorporated into automatic speech recognition systems.
This paper presents a cross-domain and cross-lingual A2A inversion approach that utilizes the parallel audio and ultrasound tongue imaging (UTI) data of the 24-hour TaL corpus in A2A model pre-training.
Experiments conducted on three tasks suggested incorporating the generated articulatory features consistently outperformed the baseline TDNN and Conformer ASR systems.
arXiv Detail & Related papers (2022-06-15T07:20:28Z) - Investigation of Data Augmentation Techniques for Disordered Speech
Recognition [69.50670302435174]
This paper investigates a set of data augmentation techniques for disordered speech recognition.
Both normal and disordered speech were exploited in the augmentation process.
The final speaker adapted system constructed using the UASpeech corpus and the best augmentation approach based on speed perturbation produced up to 2.92% absolute word error rate (WER)
arXiv Detail & Related papers (2022-01-14T17:09:22Z) - Comparative Study of Speech Analysis Methods to Predict Parkinson's
Disease [0.0]
Speech disorders can be used to detect this disease before it degenerates.
This work analyzes speech features and machine learning approaches to predict PD.
Using all the acoustic features and MFCC, together with SVM produced the highest performance with an accuracy of 98%.
arXiv Detail & Related papers (2021-11-15T04:29:51Z) - A Preliminary Study of a Two-Stage Paradigm for Preserving Speaker
Identity in Dysarthric Voice Conversion [50.040466658605524]
We propose a new paradigm for maintaining speaker identity in dysarthric voice conversion (DVC)
The poor quality of dysarthric speech can be greatly improved by statistical VC.
But as the normal speech utterances of a dysarthria patient are nearly impossible to collect, previous work failed to recover the individuality of the patient.
arXiv Detail & Related papers (2021-06-02T18:41:03Z) - Bayesian Learning for Deep Neural Network Adaptation [57.70991105736059]
A key task for speech recognition systems is to reduce the mismatch between training and evaluation data that is often attributable to speaker differences.
Model-based speaker adaptation approaches often require sufficient amounts of target speaker data to ensure robustness.
This paper proposes a full Bayesian learning based DNN speaker adaptation framework to model speaker-dependent (SD) parameter uncertainty.
arXiv Detail & Related papers (2020-12-14T12:30:41Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.