Related papers: BenSParX: A Robust Explainable Machine Learning Framework for Parkinson's Disease Detection from Bengali Conversational Speech

BenSParX: A Robust Explainable Machine Learning Framework for Parkinson's Disease Detection from Bengali Conversational Speech

URL: http://arxiv.org/abs/2505.12192v1
Date: Sun, 18 May 2025 01:58:36 GMT
Title: BenSParX: A Robust Explainable Machine Learning Framework for Parkinson's Disease Detection from Bengali Conversational Speech
Authors: Riad Hossain, Muhammad Ashad Kabir, Arat Ibne Golam Mowla, Animesh Chandra Roy, Ranjit Kumar Ghosh,
Abstract summary: Parkinson's disease (PD) poses a growing global health challenge, with Bangladesh experiencing a notable rise in PD mortality.<n>We present BenSparX, the first Bengali conversational speech dataset for PD detection.<n>We also present a robust and explainable machine learning framework tailored for early diagnosis.
Score: 0.7623426349237178
License: http://creativecommons.org/licenses/by-nc-nd/4.0/
Abstract: Parkinson's disease (PD) poses a growing global health challenge, with Bangladesh experiencing a notable rise in PD-related mortality. Early detection of PD remains particularly challenging in resource-constrained settings, where voice-based analysis has emerged as a promising non-invasive and cost-effective alternative. However, existing studies predominantly focus on English or other major languages; notably, no voice dataset for PD exists for Bengali - posing a significant barrier to culturally inclusive and accessible healthcare solutions. Moreover, most prior studies employed only a narrow set of acoustic features, with limited or no hyperparameter tuning and feature selection strategies, and little attention to model explainability. This restricts the development of a robust and generalizable machine learning model. To address this gap, we present BenSparX, the first Bengali conversational speech dataset for PD detection, along with a robust and explainable machine learning framework tailored for early diagnosis. The proposed framework incorporates diverse acoustic feature categories, systematic feature selection methods, and state-of-the-art machine learning algorithms with extensive hyperparameter optimization. Furthermore, to enhance interpretability and trust in model predictions, the framework incorporates SHAP (SHapley Additive exPlanations) analysis to quantify the contribution of individual acoustic features toward PD detection. Our framework achieves state-of-the-art performance, yielding an accuracy of 95.77%, F1 score of 95.57%, and AUC-ROC of 0.982. We further externally validated our approach by applying the framework to existing PD datasets in other languages, where it consistently outperforms state-of-the-art approaches. To facilitate further research and reproducibility, the dataset has been made publicly available at https://github.com/Riad071/BenSParX.

Related papers

A Multi-Stage Large Language Model Framework for Extracting Suicide-Related Social Determinants of Health [28.066261773189865]
Social determinants of health (SDoH) factors contributing to suicide incidents are crucial for early intervention and prevention.<n>We present a multi-stage large language model framework to enhance SDoH factor extraction from unstructured text.
arXiv Detail & Related papers (2025-08-07T03:36:38Z)
RadFabric: Agentic AI System with Reasoning Capability for Radiology [61.25593938175618]
RadFabric is a multi agent, multimodal reasoning framework that unifies visual and textual analysis for comprehensive CXR interpretation.<n>System employs specialized CXR agents for pathology detection, an Anatomical Interpretation Agent to map visual findings to precise anatomical structures, and a Reasoning Agent powered by large multimodal reasoning models to synthesize visual, anatomical, and clinical data into transparent and evidence based diagnoses.
arXiv Detail & Related papers (2025-06-17T03:10:33Z)
Dementia Insights: A Context-Based MultiModal Approach [0.3749861135832073]
Early detection is crucial for timely interventions that may slow disease progression.<n>Large pre-trained models (LPMs) for text and audio have shown promise in identifying cognitive impairments.<n>This study proposes a context-based multimodal method, integrating both text and audio data using the best-performing LPMs.
arXiv Detail & Related papers (2025-03-03T06:46:26Z)
NeuroXVocal: Detection and Explanation of Alzheimer's Disease through Non-invasive Analysis of Picture-prompted Speech [4.815952991777717]
NeuroXVocal is a novel dual-component system that classifies and explains potential Alzheimer's Disease (AD) cases through speech analysis.<n>The classification component (Neuro) processes three distinct data streams: acoustic features capturing speech patterns and voice characteristics, textual features extracted from speech transcriptions, and precomputed embeddings representing linguistic patterns.<n>The explainability component (XVocal) implements a Retrieval-Augmented Generation (RAG) approach, leveraging Large Language Models combined with a domain-specific knowledge base of AD research literature.
arXiv Detail & Related papers (2025-02-14T12:09:49Z)
Continually Evolved Multimodal Foundation Models for Cancer Prognosis [50.43145292874533]
Cancer prognosis is a critical task that involves predicting patient outcomes and survival rates.<n>Previous studies have integrated diverse data modalities, such as clinical notes, medical images, and genomic data, leveraging their complementary information.<n>Existing approaches face two major limitations. First, they struggle to incorporate newly arrived data with varying distributions into training, such as patient records from different hospitals.<n>Second, most multimodal integration methods rely on simplistic concatenation or task-specific pipelines, which fail to capture the complex interdependencies across modalities.
arXiv Detail & Related papers (2025-01-30T06:49:57Z)
Where are we in audio deepfake detection? A systematic analysis over generative and detection models [59.09338266364506]
SONAR is a synthetic AI-Audio Detection Framework and Benchmark.<n>It provides a comprehensive evaluation for distinguishing cutting-edge AI-synthesized auditory content.<n>It is the first framework to uniformly benchmark AI-audio detection across both traditional and foundation model-based detection systems.
arXiv Detail & Related papers (2024-10-06T01:03:42Z)
Differential privacy enables fair and accurate AI-based analysis of speech disorders while protecting patient data [10.6135892856374]
This study is the first to investigate differential privacy (DP)'s impact on pathological speech data.<n>We observed a maximum accuracy reduction of 3.85% when training with DP with high privacy levels.<n>To generalize our findings across languages and disorders, we validated our approach on a dataset of Spanish-speaking Parkinson's disease patients.
arXiv Detail & Related papers (2024-09-27T18:25:54Z)
Early Recognition of Parkinson's Disease Through Acoustic Analysis and Machine Learning [0.0]
Parkinson's Disease (PD) is a progressive neurodegenerative disorder that significantly impacts both motor and non-motor functions, including speech. This paper provides a comprehensive review of methods for PD recognition using speech data, highlighting advances in machine learning and data-driven approaches. Various classification algorithms are explored, including logistic regression, SVM, and neural networks, with and without feature selection. Our findings indicate that specific acoustic features and advanced machine-learning techniques can effectively differentiate between individuals with PD and healthy controls.
arXiv Detail & Related papers (2024-07-22T23:24:02Z)
A Novel Fusion Architecture for PD Detection Using Semi-Supervised Speech Embeddings [8.996456485141069]
We present a framework to recognize Parkinson's disease (PD) through an English pangram utterance speech collected using a web application. Our dataset includes a global cohort of 1306 participants, including 392 diagnosed with PD. We used deep learning embeddings derived from semi-supervised models such as Wav2Vec 2.0, WavLM, and ImageBind representing the speech dynamics associated with PD.
arXiv Detail & Related papers (2024-05-21T16:06:51Z)
DPP-Based Adversarial Prompt Searching for Lanugage Models [56.73828162194457]
Auto-regressive Selective Replacement Ascent (ASRA) is a discrete optimization algorithm that selects prompts based on both quality and similarity with determinantal point process (DPP) Experimental results on six different pre-trained language models demonstrate the efficacy of ASRA for eliciting toxic content.
arXiv Detail & Related papers (2024-03-01T05:28:06Z)
Automatic diagnosis of knee osteoarthritis severity using Swin transformer [55.01037422579516]
Knee osteoarthritis (KOA) is a widespread condition that can cause chronic pain and stiffness in the knee joint. We propose an automated approach that employs the Swin Transformer to predict the severity of KOA.
arXiv Detail & Related papers (2023-07-10T09:49:30Z)
Exploring linguistic feature and model combination for speech recognition based automatic AD detection [61.91708957996086]
Speech based automatic AD screening systems provide a non-intrusive and more scalable alternative to other clinical screening techniques. Scarcity of specialist data leads to uncertainty in both model selection and feature learning when developing such systems. This paper investigates the use of feature and model combination approaches to improve the robustness of domain fine-tuning of BERT and Roberta pre-trained text encoders.
arXiv Detail & Related papers (2022-06-28T05:09:01Z)

This list is automatically generated from the titles and abstracts of the papers in this site.