Comparative Study of Speech Analysis Methods to Predict Parkinson's
Disease
- URL: http://arxiv.org/abs/2111.10207v1
- Date: Mon, 15 Nov 2021 04:29:51 GMT
- Title: Comparative Study of Speech Analysis Methods to Predict Parkinson's
Disease
- Authors: Adedolapo Aishat Toye and Suryaprakash Kompalli
- Abstract summary: Speech disorders can be used to detect this disease before it degenerates.
This work analyzes speech features and machine learning approaches to predict PD.
Using all the acoustic features and MFCC, together with SVM produced the highest performance with an accuracy of 98%.
- Score: 0.0
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: One of the symptoms observed in the early stages of Parkinson's Disease (PD)
is speech impairment. Speech disorders can be used to detect this disease
before it degenerates. This work analyzes speech features and machine learning
approaches to predict PD. Acoustic features such as shimmer and jitter
variants, and Mel Frequency Cepstral Coefficients (MFCC) are extracted from
speech signals. We use two datasets in this work: the MDVR-KCL and the Italian
Parkinson's Voice and Speech database. To separate PD and non-PD speech
signals, seven classification models were implemented: K-Nearest Neighbor,
Decision Trees, Support Vector Machines, Naive Bayes, Logistic Regression,
Gradient Boosting, Random Forests. Three feature sets were used for each of the
models: (a) Acoustic features only, (b) All the acoustic features and MFCC, (c)
Selected subset of features from acoustic features and MFCC. Using all the
acoustic features and MFCC, together with SVM produced the highest performance
with an accuracy of 98% and F1-Score of 99%. When compared with prior art, this
shows a better performance. Our code and related documentation is available in
a public domain repository.
Related papers
- The Unreliability of Acoustic Systems in Alzheimer's Speech Datasets with Heterogeneous Recording Conditions [11.00082412847855]
We show that systems based on two acoustic features, MFCCs and Wav2vec 2.0 embeddings, can discriminate AD patients from controls with above-chance performance.
Our results are a warning against the use of acoustic systems for identifying patients based on non-standardized recordings.
arXiv Detail & Related papers (2024-09-11T20:50:45Z) - Density Adaptive Attention-based Speech Network: Enhancing Feature Understanding for Mental Health Disorders [0.8437187555622164]
We introduce DAAMAudioCNNLSTM and DAAMAudioTransformer, two parameter efficient and explainable models for audio feature extraction and depression detection.
Both models' significant explainability and efficiency in leveraging speech signals for depression detection represent a leap towards more reliable, clinically useful diagnostic tools.
arXiv Detail & Related papers (2024-08-31T08:50:28Z) - Detecting Speech Abnormalities with a Perceiver-based Sequence
Classifier that Leverages a Universal Speech Model [4.503292461488901]
We propose a Perceiver-based sequence to detect abnormalities in speech reflective of several neurological disorders.
We combine this sequence with a Universal Speech Model (USM) that is trained (unsupervised) on 12 million hours of diverse audio recordings.
Our model outperforms standard transformer (80.9%) and perceiver (81.8%) models and achieves an average accuracy of 83.1%.
arXiv Detail & Related papers (2023-10-16T21:07:12Z) - High-Fidelity Speech Synthesis with Minimal Supervision: All Using
Diffusion Models [56.00939852727501]
Minimally-supervised speech synthesis decouples TTS by combining two types of discrete speech representations.
Non-autoregressive framework enhances controllability, and duration diffusion model enables diversified prosodic expression.
arXiv Detail & Related papers (2023-09-27T09:27:03Z) - Deep Feature Learning for Medical Acoustics [78.56998585396421]
The purpose of this paper is to compare different learnables in medical acoustics tasks.
A framework has been implemented to classify human respiratory sounds and heartbeats in two categories, i.e. healthy or affected by pathologies.
arXiv Detail & Related papers (2022-08-05T10:39:37Z) - Audio Deepfake Detection Based on a Combination of F0 Information and
Real Plus Imaginary Spectrogram Features [51.924340387119415]
Experimental results on the ASVspoof 2019 LA dataset show that our proposed system is very effective for the audio deepfake detection task.
Our proposed system is very effective for the audio deepfake detection task, achieving an equivalent error rate (EER) of 0.43%, which surpasses almost all systems.
arXiv Detail & Related papers (2022-08-02T02:46:16Z) - Alzheimer's Dementia Recognition Using Acoustic, Lexical, Disfluency and
Speech Pause Features Robust to Noisy Inputs [11.34426502082293]
We present two multimodal fusion-based deep learning models that consume ASR transcribed speech and acoustic data simultaneously to classify whether a speaker has Alzheimer's Disease.
Our best model, a BiLSTM with highway layers using words, word probabilities, disfluency features, pause information, and a variety of acoustic features, achieves an accuracy of 84% and RSME error prediction of 4.26 on MMSE cognitive scores.
arXiv Detail & Related papers (2021-06-29T19:24:29Z) - DiffSVC: A Diffusion Probabilistic Model for Singing Voice Conversion [51.83469048737548]
We propose DiffSVC, an SVC system based on denoising diffusion probabilistic model.
A denoising module is trained in DiffSVC, which takes destroyed mel spectrogram and its corresponding step information as input to predict the added Gaussian noise.
Experiments show that DiffSVC can achieve superior conversion performance in terms of naturalness and voice similarity to current state-of-the-art SVC approaches.
arXiv Detail & Related papers (2021-05-28T14:26:40Z) - DiffSinger: Diffusion Acoustic Model for Singing Voice Synthesis [53.19363127760314]
DiffSinger is a parameterized Markov chain which iteratively converts the noise into mel-spectrogram conditioned on the music score.
The evaluations conducted on the Chinese singing dataset demonstrate that DiffSinger outperforms state-of-the-art SVS work with a notable margin.
arXiv Detail & Related papers (2021-05-06T05:21:42Z) - Adaptive Multi-View ICA: Estimation of noise levels for optimal
inference [65.94843987207445]
Adaptive multiView ICA (AVICA) is a noisy ICA model where each view is a linear mixture of shared independent sources with additive noise on the sources.
On synthetic data, AVICA yields better sources estimates than other group ICA methods thanks to its explicit MMSE estimator.
On real magnetoencephalograpy (MEG) data, we provide evidence that the decomposition is less sensitive to sampling noise and that the noise variance estimates are biologically plausible.
arXiv Detail & Related papers (2021-02-22T13:10:12Z) - Detecting Parkinson's Disease From an Online Speech-task [4.968576908394359]
In this paper, we envision a web-based framework that can help anyone, anywhere around the world record a short speech task, and analyze the recorded data to screen for Parkinson's disease (PD)
We collected data from 726 unique participants (262 PD, 38% female; 464 non-PD, 65% female; average age: 61) from all over the US and beyond.
We extracted both standard acoustic features (MFCC), jitter and shimmer variants, and deep learning based features from the speech data.
Our model performed equally well on data collected in controlled lab environment as well as 'in the wild'
arXiv Detail & Related papers (2020-09-02T21:16:24Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.