The effect of speech pathology on automatic speaker verification -- a
large-scale study
- URL: http://arxiv.org/abs/2204.06450v3
- Date: Wed, 22 Nov 2023 14:10:56 GMT
- Title: The effect of speech pathology on automatic speaker verification -- a
large-scale study
- Authors: Soroosh Tayebi Arasteh, Tobias Weise, Maria Schuster, Elmar Noeth,
Andreas Maier, Seung Hee Yang
- Abstract summary: pathological speech faces heightened privacy breach risks compared to healthy speech.
Adults with dysphonia are at heightened re-identification risks, whereas conditions like dysarthria yield results comparable to those of healthy speakers.
Merging data across pathological types led to a marked EER decrease, suggesting the potential benefits of pathological diversity in automatic speaker verification.
- Score: 6.468412158245622
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Navigating the challenges of data-driven speech processing, one of the
primary hurdles is accessing reliable pathological speech data. While public
datasets appear to offer solutions, they come with inherent risks of potential
unintended exposure of patient health information via re-identification
attacks. Using a comprehensive real-world pathological speech corpus, with over
n=3,800 test subjects spanning various age groups and speech disorders, we
employed a deep-learning-driven automatic speaker verification (ASV) approach.
This resulted in a notable mean equal error rate (EER) of 0.89% with a standard
deviation of 0.06%, outstripping traditional benchmarks. Our comprehensive
assessments demonstrate that pathological speech overall faces heightened
privacy breach risks compared to healthy speech. Specifically, adults with
dysphonia are at heightened re-identification risks, whereas conditions like
dysarthria yield results comparable to those of healthy speakers. Crucially,
speech intelligibility does not influence the ASV system's performance metrics.
In pediatric cases, particularly those with cleft lip and palate, the recording
environment plays a decisive role in re-identification. Merging data across
pathological types led to a marked EER decrease, suggesting the potential
benefits of pathological diversity in ASV, accompanied by a logarithmic boost
in ASV effectiveness. In essence, this research sheds light on the dynamics
between pathological speech and speaker verification, emphasizing its crucial
role in safeguarding patient confidentiality in our increasingly digitized
healthcare era.
Related papers
- Differential privacy for protecting patient data in speech disorder detection using deep learning [11.01272267983849]
This study is the first to investigate differential privacy (DP)'s impact on pathological speech data.
We observed a maximum accuracy reduction of 3.85% when training with DP with a privacy budget of 7.51.
To generalize our findings, we validated our approach on a smaller dataset of Spanish-speaking Parkinson's disease patients.
arXiv Detail & Related papers (2024-09-27T18:25:54Z) - Developing vocal system impaired patient-aimed voice quality assessment approach using ASR representation-included multiple features [0.4681310436826459]
This article showcases the utilization of automatic speech recognition and self-supervised learning representations, pre-trained on extensive datasets of normal speech.
Experiments involve checks on PVQD dataset, covering various causes of vocal system damage in English, and a Japanese dataset focusing on patients with Parkinson's disease.
The results on PVQD reveal a notable correlation (>0.8 on PCC) and an extraordinary accuracy (0.5 on MSE) in predicting Grade, Breathy, and Asthenic indicators.
arXiv Detail & Related papers (2024-08-22T10:22:53Z) - Exploring Speech Pattern Disorders in Autism using Machine Learning [12.469348589699766]
This study presents a comprehensive approach to identify distinctive speech patterns through the analysis of examiner-patient dialogues.
We extracted 40 speech-related features, categorized into frequency, zero-crossing rate, energy, spectral characteristics, Mel Frequency Cepstral Coefficients (MFCCs) and balance.
The classification model aimed to differentiate between ASD and non-ASD cases, achieving an accuracy of 87.75%.
arXiv Detail & Related papers (2024-05-03T02:59:15Z) - Automatically measuring speech fluency in people with aphasia: first
achievements using read-speech data [55.84746218227712]
This study aims at assessing the relevance of a signalprocessingalgorithm, initially developed in the field of language acquisition, for the automatic measurement of speech fluency.
arXiv Detail & Related papers (2023-08-09T07:51:40Z) - Hyper-parameter Adaptation of Conformer ASR Systems for Elderly and
Dysarthric Speech Recognition [64.9816313630768]
Fine-tuning is often used to exploit the large quantities of non-aged and healthy speech pre-trained models.
This paper investigates hyper- parameter adaptation for Conformer ASR systems that are pre-trained on the Librispeech corpus.
arXiv Detail & Related papers (2023-06-27T07:49:35Z) - Factors Affecting the Performance of Automated Speaker Verification in
Alzheimer's Disease Clinical Trials [4.0388304511445146]
Automated speaker verification (ASV) models are crucial to verify the identity of enrolled individuals and remove duplicates in clinical trials.
Our study finds that voice biometrics raise fairness concerns as certain subgroups exhibit different ASV performances owing to their inherent voice characteristics.
arXiv Detail & Related papers (2023-06-20T12:24:46Z) - The Far Side of Failure: Investigating the Impact of Speech Recognition
Errors on Subsequent Dementia Classification [8.032686410648274]
Linguistic anomalies detectable in spontaneous speech have shown promise for various clinical applications including screening for dementia and other forms of cognitive impairment.
The impressive performance of self-supervised learning (SSL) automatic speech recognition (ASR) models with curated speech data is not apparent with challenging speech samples from clinical settings.
One of our key findings is that, paradoxically, ASR systems with relatively high error rates can produce transcripts that result in better downstream classification accuracy than classification based on verbatim transcripts.
arXiv Detail & Related papers (2022-11-11T17:06:45Z) - Exploiting Cross-domain And Cross-Lingual Ultrasound Tongue Imaging
Features For Elderly And Dysarthric Speech Recognition [55.25565305101314]
Articulatory features are invariant to acoustic signal distortion and have been successfully incorporated into automatic speech recognition systems.
This paper presents a cross-domain and cross-lingual A2A inversion approach that utilizes the parallel audio and ultrasound tongue imaging (UTI) data of the 24-hour TaL corpus in A2A model pre-training.
Experiments conducted on three tasks suggested incorporating the generated articulatory features consistently outperformed the baseline TDNN and Conformer ASR systems.
arXiv Detail & Related papers (2022-06-15T07:20:28Z) - Investigation of Data Augmentation Techniques for Disordered Speech
Recognition [69.50670302435174]
This paper investigates a set of data augmentation techniques for disordered speech recognition.
Both normal and disordered speech were exploited in the augmentation process.
The final speaker adapted system constructed using the UASpeech corpus and the best augmentation approach based on speed perturbation produced up to 2.92% absolute word error rate (WER)
arXiv Detail & Related papers (2022-01-14T17:09:22Z) - Bootstrapping Your Own Positive Sample: Contrastive Learning With
Electronic Health Record Data [62.29031007761901]
This paper proposes a novel contrastive regularized clinical classification model.
We introduce two unique positive sampling strategies specifically tailored for EHR data.
Our framework yields highly competitive experimental results in predicting the mortality risk on real-world COVID-19 EHR data.
arXiv Detail & Related papers (2021-04-07T06:02:04Z) - NUVA: A Naming Utterance Verifier for Aphasia Treatment [49.114436579008476]
Assessment of speech performance using picture naming tasks is a key method for both diagnosis and monitoring of responses to treatment interventions by people with aphasia (PWA)
Here we present NUVA, an utterance verification system incorporating a deep learning element that classifies 'correct' versus'incorrect' naming attempts from aphasic stroke patients.
When tested on eight native British-English speaking PWA the system's performance accuracy ranged between 83.6% to 93.6%, with a 10-fold cross-validation mean of 89.5%.
arXiv Detail & Related papers (2021-02-10T13:00:29Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.