Differential privacy enables fair and accurate AI-based analysis of speech disorders while protecting patient data
- URL: http://arxiv.org/abs/2409.19078v2
- Date: Thu, 26 Dec 2024 12:56:59 GMT
- Title: Differential privacy enables fair and accurate AI-based analysis of speech disorders while protecting patient data
- Authors: Soroosh Tayebi Arasteh, Mahshad Lotfinia, Paula Andrea Perez-Toro, Tomas Arias-Vergara, Mahtab Ranji, Juan Rafael Orozco-Arroyave, Maria Schuster, Andreas Maier, Seung Hee Yang,
- Abstract summary: This study is the first to investigate differential privacy (DP)'s impact on pathological speech data.
We observed a maximum accuracy reduction of 3.85% when training with DP with high privacy levels.
To generalize our findings across languages and disorders, we validated our approach on a dataset of Spanish-speaking Parkinson's disease patients.
- Score: 10.6135892856374
- License:
- Abstract: Speech pathology has impacts on communication abilities and quality of life. While deep learning-based models have shown potential in diagnosing these disorders, the use of sensitive data raises critical privacy concerns. Although differential privacy (DP) has been explored in the medical imaging domain, its application in pathological speech analysis remains largely unexplored despite the equally critical privacy concerns. This study is the first to investigate DP's impact on pathological speech data, focusing on the trade-offs between privacy, diagnostic accuracy, and fairness. Using a large, real-world dataset of 200 hours of recordings from 2,839 German-speaking participants, we observed a maximum accuracy reduction of 3.85% when training with DP with high privacy levels. To highlight real-world privacy risks, we demonstrated the vulnerability of non-private models to explicit gradient inversion attacks, reconstructing identifiable speech samples and showcasing DP's effectiveness in mitigating these risks. To generalize our findings across languages and disorders, we validated our approach on a dataset of Spanish-speaking Parkinson's disease patients, leveraging pretrained models from healthy English-speaking datasets, and demonstrated that careful pretraining on large-scale task-specific datasets can maintain favorable accuracy under DP constraints. A comprehensive fairness analysis revealed minimal gender bias at reasonable privacy levels but underscored the need for addressing age-related disparities. Our results establish that DP can balance privacy and utility in speech disorder detection, while highlighting unique challenges in privacy-fairness trade-offs for speech data. This provides a foundation for refining DP methodologies and improving fairness across diverse patient groups in real-world deployments.
Related papers
- Towards Privacy-aware Mental Health AI Models: Advances, Challenges, and Opportunities [61.633126163190724]
Mental illness is a widespread and debilitating condition with substantial societal and personal costs.
Recent advances in Artificial Intelligence (AI) hold great potential for recognizing and addressing conditions such as depression, anxiety disorder, bipolar disorder, schizophrenia, and post-traumatic stress disorder.
Privacy concerns, including the risk of sensitive data leakage from datasets and trained models, remain a critical barrier to deploying these AI systems in real-world clinical settings.
arXiv Detail & Related papers (2025-02-01T15:10:02Z) - Privacy-Preserving Heterogeneous Federated Learning for Sensitive Healthcare Data [12.30620268528346]
We propose a new framework termed Abstention-Aware Federated Voting (AAFV)
AAFV can collaboratively and confidentially train heterogeneous local models while simultaneously protecting the data privacy.
In particular, the proposed abstention-aware voting mechanism exploits a threshold-based abstention method to select high-confidence votes from heterogeneous local models.
arXiv Detail & Related papers (2024-06-15T08:43:40Z) - The Impact of Speech Anonymization on Pathology and Its Limits [6.687980678659259]
This study investigates anonymization's impact on pathological speech across over 2,700 speakers from multiple German institutions.
Specific disorders such as Dysarthria, Dysphonia, and Cleft Lip and Palate experienced minimal utility changes, while Dysglossia showed slight improvements.
arXiv Detail & Related papers (2024-04-11T18:06:35Z) - PrivacyMind: Large Language Models Can Be Contextual Privacy Protection Learners [81.571305826793]
We introduce Contextual Privacy Protection Language Models (PrivacyMind)
Our work offers a theoretical analysis for model design and benchmarks various techniques.
In particular, instruction tuning with both positive and negative examples stands out as a promising method.
arXiv Detail & Related papers (2023-10-03T22:37:01Z) - Private, fair and accurate: Training large-scale, privacy-preserving AI models in medical imaging [47.99192239793597]
We evaluated the effect of privacy-preserving training of AI models regarding accuracy and fairness compared to non-private training.
Our study shows that -- under the challenging realistic circumstances of a real-life clinical dataset -- the privacy-preserving training of diagnostic deep learning models is possible with excellent diagnostic accuracy and fairness.
arXiv Detail & Related papers (2023-02-03T09:49:13Z) - How Do Input Attributes Impact the Privacy Loss in Differential Privacy? [55.492422758737575]
We study the connection between the per-subject norm in DP neural networks and individual privacy loss.
We introduce a novel metric termed the Privacy Loss-Input Susceptibility (PLIS) which allows one to apportion the subject's privacy loss to their input attributes.
arXiv Detail & Related papers (2022-11-18T11:39:03Z) - The effect of speech pathology on automatic speaker verification -- a
large-scale study [6.468412158245622]
pathological speech faces heightened privacy breach risks compared to healthy speech.
Adults with dysphonia are at heightened re-identification risks, whereas conditions like dysarthria yield results comparable to those of healthy speakers.
Merging data across pathological types led to a marked EER decrease, suggesting the potential benefits of pathological diversity in automatic speaker verification.
arXiv Detail & Related papers (2022-04-13T15:17:00Z) - Robustness Threats of Differential Privacy [70.818129585404]
We experimentally demonstrate that networks, trained with differential privacy, in some settings might be even more vulnerable in comparison to non-private versions.
We study how the main ingredients of differentially private neural networks training, such as gradient clipping and noise addition, affect the robustness of the model.
arXiv Detail & Related papers (2020-12-14T18:59:24Z) - Differentially Private and Fair Deep Learning: A Lagrangian Dual
Approach [54.32266555843765]
This paper studies a model that protects the privacy of the individuals sensitive information while also allowing it to learn non-discriminatory predictors.
The method relies on the notion of differential privacy and the use of Lagrangian duality to design neural networks that can accommodate fairness constraints.
arXiv Detail & Related papers (2020-09-26T10:50:33Z) - Anonymizing Data for Privacy-Preserving Federated Learning [3.3673553810697827]
We propose the first syntactic approach for offering privacy in the context of federated learning.
Our approach aims to maximize utility or model performance, while supporting a defensible level of privacy.
We perform a comprehensive empirical evaluation on two important problems in the healthcare domain, using real-world electronic health data of 1 million patients.
arXiv Detail & Related papers (2020-02-21T02:30:16Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.