Factors Affecting the Performance of Automated Speaker Verification in
Alzheimer's Disease Clinical Trials
- URL: http://arxiv.org/abs/2306.12444v1
- Date: Tue, 20 Jun 2023 12:24:46 GMT
- Title: Factors Affecting the Performance of Automated Speaker Verification in
Alzheimer's Disease Clinical Trials
- Authors: Malikeh Ehghaghi, Marija Stanojevic, Ali Akram, Jekaterina Novikova
- Abstract summary: Automated speaker verification (ASV) models are crucial to verify the identity of enrolled individuals and remove duplicates in clinical trials.
Our study finds that voice biometrics raise fairness concerns as certain subgroups exhibit different ASV performances owing to their inherent voice characteristics.
- Score: 4.0388304511445146
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Detecting duplicate patient participation in clinical trials is a major
challenge because repeated patients can undermine the credibility and accuracy
of the trial's findings and result in significant health and financial risks.
Developing accurate automated speaker verification (ASV) models is crucial to
verify the identity of enrolled individuals and remove duplicates, but the size
and quality of data influence ASV performance. However, there has been limited
investigation into the factors that can affect ASV capabilities in clinical
environments. In this paper, we bridge the gap by conducting analysis of how
participant demographic characteristics, audio quality criteria, and severity
level of Alzheimer's disease (AD) impact the performance of ASV utilizing a
dataset of speech recordings from 659 participants with varying levels of AD,
obtained through multiple speech tasks. Our results indicate that ASV
performance: 1) is slightly better on male speakers than on female speakers; 2)
degrades for individuals who are above 70 years old; 3) is comparatively better
for non-native English speakers than for native English speakers; 4) is
negatively affected by clinician interference, noisy background, and unclear
participant speech; 5) tends to decrease with an increase in the severity level
of AD. Our study finds that voice biometrics raise fairness concerns as certain
subgroups exhibit different ASV performances owing to their inherent voice
characteristics. Moreover, the performance of ASV is influenced by the quality
of speech recordings, which underscores the importance of improving the data
collection settings in clinical trials.
Related papers
- Developing vocal system impaired patient-aimed voice quality assessment approach using ASR representation-included multiple features [0.4681310436826459]
This article showcases the utilization of automatic speech recognition and self-supervised learning representations, pre-trained on extensive datasets of normal speech.
Experiments involve checks on PVQD dataset, covering various causes of vocal system damage in English, and a Japanese dataset focusing on patients with Parkinson's disease.
The results on PVQD reveal a notable correlation (>0.8 on PCC) and an extraordinary accuracy (0.5 on MSE) in predicting Grade, Breathy, and Asthenic indicators.
arXiv Detail & Related papers (2024-08-22T10:22:53Z) - Homogeneous Speaker Features for On-the-Fly Dysarthric and Elderly Speaker Adaptation [71.31331402404662]
This paper proposes two novel data-efficient methods to learn dysarthric and elderly speaker-level features.
Speaker-regularized spectral basis embedding-SBE features that exploit a special regularization term to enforce homogeneity of speaker features in adaptation.
Feature-based learning hidden unit contributions (f-LHUC) that are conditioned on VR-LH features that are shown to be insensitive to speaker-level data quantity in testtime adaptation.
arXiv Detail & Related papers (2024-07-08T18:20:24Z) - Zero-Shot Multi-Lingual Speaker Verification in Clinical Trials [4.231937382464348]
In clinical trials, patients are assessed based on their speech data to detect and monitor cognitive and mental health disorders.
We propose using these speech recordings to verify the identities of enrolled patients and identify and exclude the individuals who try to enroll multiple times in the same trial.
We evaluate pre-trained TitaNet, ECAPA-TDNN, and SpeakerNet models by enrolling and testing with speech-impaired patients speaking English, German, Danish, Spanish, and Arabic languages.
arXiv Detail & Related papers (2024-04-02T14:19:30Z) - Identification of Cognitive Decline from Spoken Language through Feature
Selection and the Bag of Acoustic Words Model [0.0]
The early identification of symptoms of memory disorders plays a significant role in ensuring the well-being of populations.
The lack of standardized speech tests in clinical settings has led to a growing emphasis on developing automatic machine learning techniques for analyzing naturally spoken language.
The work presents an approach related to feature selection, allowing for the automatic selection of the essential features required for diagnosis from the Geneva minimalistic acoustic parameter set and relative speech pauses.
arXiv Detail & Related papers (2024-02-02T17:06:03Z) - Sensitivity, Performance, Robustness: Deconstructing the Effect of
Sociodemographic Prompting [64.80538055623842]
sociodemographic prompting is a technique that steers the output of prompt-based models towards answers that humans with specific sociodemographic profiles would give.
We show that sociodemographic information affects model predictions and can be beneficial for improving zero-shot learning in subjective NLP tasks.
arXiv Detail & Related papers (2023-09-13T15:42:06Z) - Analysing the Impact of Audio Quality on the Use of Naturalistic
Long-Form Recordings for Infant-Directed Speech Research [62.997667081978825]
Modelling of early language acquisition aims to understand how infants bootstrap their language skills.
Recent developments have enabled the use of more naturalistic training data for computational models.
It is currently unclear how the sound quality could affect analyses and modelling experiments conducted on such data.
arXiv Detail & Related papers (2023-05-03T08:25:37Z) - SPeC: A Soft Prompt-Based Calibration on Performance Variability of
Large Language Model in Clinical Notes Summarization [50.01382938451978]
We introduce a model-agnostic pipeline that employs soft prompts to diminish variance while preserving the advantages of prompt-based summarization.
Experimental findings indicate that our method not only bolsters performance but also effectively curbs variance for various language models.
arXiv Detail & Related papers (2023-03-23T04:47:46Z) - Leveraging Pretrained Representations with Task-related Keywords for
Alzheimer's Disease Detection [69.53626024091076]
Alzheimer's disease (AD) is particularly prominent in older adults.
Recent advances in pre-trained models motivate AD detection modeling to shift from low-level features to high-level representations.
This paper presents several efficient methods to extract better AD-related cues from high-level acoustic and linguistic features.
arXiv Detail & Related papers (2023-03-14T16:03:28Z) - The Far Side of Failure: Investigating the Impact of Speech Recognition
Errors on Subsequent Dementia Classification [8.032686410648274]
Linguistic anomalies detectable in spontaneous speech have shown promise for various clinical applications including screening for dementia and other forms of cognitive impairment.
The impressive performance of self-supervised learning (SSL) automatic speech recognition (ASR) models with curated speech data is not apparent with challenging speech samples from clinical settings.
One of our key findings is that, paradoxically, ASR systems with relatively high error rates can produce transcripts that result in better downstream classification accuracy than classification based on verbatim transcripts.
arXiv Detail & Related papers (2022-11-11T17:06:45Z) - The effect of speech pathology on automatic speaker verification -- a
large-scale study [6.468412158245622]
pathological speech faces heightened privacy breach risks compared to healthy speech.
Adults with dysphonia are at heightened re-identification risks, whereas conditions like dysarthria yield results comparable to those of healthy speakers.
Merging data across pathological types led to a marked EER decrease, suggesting the potential benefits of pathological diversity in automatic speaker verification.
arXiv Detail & Related papers (2022-04-13T15:17:00Z) - A Preliminary Study of a Two-Stage Paradigm for Preserving Speaker
Identity in Dysarthric Voice Conversion [50.040466658605524]
We propose a new paradigm for maintaining speaker identity in dysarthric voice conversion (DVC)
The poor quality of dysarthric speech can be greatly improved by statistical VC.
But as the normal speech utterances of a dysarthria patient are nearly impossible to collect, previous work failed to recover the individuality of the patient.
arXiv Detail & Related papers (2021-06-02T18:41:03Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.