Unveiling Interpretability in Self-Supervised Speech Representations for Parkinson's Diagnosis
- URL: http://arxiv.org/abs/2412.02006v2
- Date: Mon, 10 Feb 2025 13:25:26 GMT
- Title: Unveiling Interpretability in Self-Supervised Speech Representations for Parkinson's Diagnosis
- Authors: David Gimeno-Gómez, Catarina Botelho, Anna Pompili, Alberto Abad, Carlos-D. Martínez-Hinarejos,
- Abstract summary: We propose a novel, interpretable framework specifically designed to support Parkinson's Disease diagnosis.
Through the design of simple yet effective cross-attention mechanisms, the proposed framework offers interpretability from two distinct but complementary perspectives.
Our method results competitive with state-of-the-art approaches, while also demonstrating robustness in cross-lingual scenarios.
- Score: 9.91077163490596
- License:
- Abstract: Recent works in pathological speech analysis have increasingly relied on powerful self-supervised speech representations, leading to promising results. However, the complex, black-box nature of these embeddings and the limited research on their interpretability significantly restrict their adoption for clinical diagnosis. To address this gap, we propose a novel, interpretable framework specifically designed to support Parkinson's Disease (PD) diagnosis. Through the design of simple yet effective cross-attention mechanisms for both embedding- and temporal-level analysis, the proposed framework offers interpretability from two distinct but complementary perspectives. Experimental findings across five well-established speech benchmarks for PD detection demonstrate the framework's capability to identify meaningful speech patterns within self-supervised representations for a wide range of assessment tasks. Fine-grained temporal analyses further underscore its potential to enhance the interpretability of deep-learning pathological speech models, paving the way for the development of more transparent, trustworthy, and clinically applicable computer-assisted diagnosis systems in this domain. Moreover, in terms of classification accuracy, our method achieves results competitive with state-of-the-art approaches, while also demonstrating robustness in cross-lingual scenarios when applied to spontaneous speech production.
Related papers
- NeuroXVocal: Detection and Explanation of Alzheimer's Disease through Non-invasive Analysis of Picture-prompted Speech [4.815952991777717]
NeuroXVocal is a novel dual-component system that classifies and explains potential Alzheimer's Disease (AD) cases through speech analysis.
The classification component (Neuro) processes three distinct data streams: acoustic features capturing speech patterns and voice characteristics, textual features extracted from speech transcriptions, and precomputed embeddings representing linguistic patterns.
The explainability component (XVocal) implements a Retrieval-Augmented Generation (RAG) approach, leveraging Large Language Models combined with a domain-specific knowledge base of AD research literature.
arXiv Detail & Related papers (2025-02-14T12:09:49Z) - Enhancing Depression Detection with Chain-of-Thought Prompting: From Emotion to Reasoning Using Large Language Models [9.43184936918456]
Depression is one of the leading causes of disability worldwide.
Recent advancements in Large Language Models have shown promise in addressing mental health challenges.
We propose a Chain-of-Thought Prompting approach that enhances both the performance and interpretability of depression detection.
arXiv Detail & Related papers (2025-02-09T12:30:57Z) - LlaMADRS: Prompting Large Language Models for Interview-Based Depression Assessment [75.44934940580112]
This study introduces LlaMADRS, a novel framework leveraging open-source Large Language Models (LLMs) to automate depression severity assessment.
We employ a zero-shot prompting strategy with carefully designed cues to guide the model in interpreting and scoring transcribed clinical interviews.
Our approach, tested on 236 real-world interviews, demonstrates strong correlations with clinician assessments.
arXiv Detail & Related papers (2025-01-07T08:49:04Z) - Investigating the Effectiveness of Explainability Methods in Parkinson's Detection from Speech [13.700867213652648]
Speech impairments in Parkinson's disease (PD) provide significant early indicators for diagnosis.
Models for speech-based PD detection have shown strong performance, but their interpretability remains underexplored.
This study systematically evaluates several explainability methods to identify PD-specific speech features.
arXiv Detail & Related papers (2024-11-12T18:43:27Z) - Selfsupervised learning for pathological speech detection [0.0]
Speech production is susceptible to influence and disruption by various neurodegenerative pathological speech disorders.
These disorders lead to pathological speech characterized by abnormal speech patterns and imprecise articulation.
Unlike neurotypical speakers, patients with speech pathologies or impairments are unable to access various virtual assistants such as Alexa, Siri, etc.
arXiv Detail & Related papers (2024-05-16T07:12:47Z) - A Dual-Prompting for Interpretable Mental Health Language Models [11.33857985668663]
The CLPsych 2024 Shared Task aims to enhance the interpretability of Large Language Models (LLMs)
We propose a dual-prompting approach: (i) Knowledge-aware evidence extraction by leveraging the expert identity and a suicide dictionary with a mental health-specific LLM; and (ii) summarization by employing an LLM-based consistency evaluator.
arXiv Detail & Related papers (2024-02-20T06:18:02Z) - Empowering Psychotherapy with Large Language Models: Cognitive
Distortion Detection through Diagnosis of Thought Prompting [82.64015366154884]
We study the task of cognitive distortion detection and propose the Diagnosis of Thought (DoT) prompting.
DoT performs diagnosis on the patient's speech via three stages: subjectivity assessment to separate the facts and the thoughts; contrastive reasoning to elicit the reasoning processes supporting and contradicting the thoughts; and schema analysis to summarize the cognition schemas.
Experiments demonstrate that DoT obtains significant improvements over ChatGPT for cognitive distortion detection, while generating high-quality rationales approved by human experts.
arXiv Detail & Related papers (2023-10-11T02:47:21Z) - This Patient Looks Like That Patient: Prototypical Networks for
Interpretable Diagnosis Prediction from Clinical Text [56.32427751440426]
In clinical practice such models must not only be accurate, but provide doctors with interpretable and helpful results.
We introduce ProtoPatient, a novel method based on prototypical networks and label-wise attention.
We evaluate the model on two publicly available clinical datasets and show that it outperforms existing baselines.
arXiv Detail & Related papers (2022-10-16T10:12:07Z) - A Preliminary Study of a Two-Stage Paradigm for Preserving Speaker
Identity in Dysarthric Voice Conversion [50.040466658605524]
We propose a new paradigm for maintaining speaker identity in dysarthric voice conversion (DVC)
The poor quality of dysarthric speech can be greatly improved by statistical VC.
But as the normal speech utterances of a dysarthria patient are nearly impossible to collect, previous work failed to recover the individuality of the patient.
arXiv Detail & Related papers (2021-06-02T18:41:03Z) - Semi-Supervised Variational Reasoning for Medical Dialogue Generation [70.838542865384]
Two key characteristics are relevant for medical dialogue generation: patient states and physician actions.
We propose an end-to-end variational reasoning approach to medical dialogue generation.
A physician policy network composed of an action-classifier and two reasoning detectors is proposed for augmented reasoning ability.
arXiv Detail & Related papers (2021-05-13T04:14:35Z) - Pose-based Body Language Recognition for Emotion and Psychiatric Symptom
Interpretation [75.3147962600095]
We propose an automated framework for body language based emotion recognition starting from regular RGB videos.
In collaboration with psychologists, we extend the framework for psychiatric symptom prediction.
Because a specific application domain of the proposed framework may only supply a limited amount of data, the framework is designed to work on a small training set.
arXiv Detail & Related papers (2020-10-30T18:45:16Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.