Related papers: The Unreliability of Acoustic Systems in Alzheimer's Speech Datasets with Heterogeneous Recording Conditions

The Unreliability of Acoustic Systems in Alzheimer's Speech Datasets with Heterogeneous Recording Conditions

URL: http://arxiv.org/abs/2409.12170v1
Date: Wed, 11 Sep 2024 20:50:45 GMT
Title: The Unreliability of Acoustic Systems in Alzheimer's Speech Datasets with Heterogeneous Recording Conditions
Authors: Lara Gauder, Pablo Riera, Andrea Slachevsky, Gonzalo Forno, Adolfo M. Garcia, Luciana Ferrer,
Abstract summary: We show that systems based on two acoustic features, MFCCs and Wav2vec 2.0 embeddings, can discriminate AD patients from controls with above-chance performance. Our results are a warning against the use of acoustic systems for identifying patients based on non-standardized recordings.
Score: 11.00082412847855
License: http://creativecommons.org/licenses/by-nc-nd/4.0/
Abstract: Automated speech analysis is a thriving approach to detect early markers of Alzheimer's disease (AD). Yet, recording conditions in most AD datasets are heterogeneous, with patients and controls often evaluated in different acoustic settings. While this is not a problem for analyses based on speech transcription or features obtained from manual alignment, it does cast serious doubts on the validity of acoustic features, which are strongly influenced by acquisition conditions. We examined this issue in the ADreSSo dataset, derived from the widely used Pitt corpus. We show that systems based on two acoustic features, MFCCs and Wav2vec 2.0 embeddings, can discriminate AD patients from controls with above-chance performance when using only the non-speech part of the audio signals. We replicated this finding in a separate dataset of Spanish speakers. Thus, in these datasets, the class can be partly predicted by recording conditions. Our results are a warning against the use of acoustic systems for identifying patients based on non-standardized recordings. We propose that acoustically heterogeneous datasets for dementia studies should be either (a) analyzed using only transcripts or other features derived from manual annotations, or (b) replaced by datasets collected with strictly controlled acoustic conditions.

Related papers

NeuroXVocal: Detection and Explanation of Alzheimer's Disease through Non-invasive Analysis of Picture-prompted Speech [4.815952991777717]
NeuroXVocal is a novel dual-component system that classifies and explains potential Alzheimer's Disease (AD) cases through speech analysis. The classification component (Neuro) processes three distinct data streams: acoustic features capturing speech patterns and voice characteristics, textual features extracted from speech transcriptions, and precomputed embeddings representing linguistic patterns. The explainability component (XVocal) implements a Retrieval-Augmented Generation (RAG) approach, leveraging Large Language Models combined with a domain-specific knowledge base of AD research literature.
arXiv Detail & Related papers (2025-02-14T12:09:49Z)
Swin-BERT: A Feature Fusion System designed for Speech-based Alzheimer's Dementia Detection [4.668008953332776]
We propose a speech-based system named Swin-BERT for automatic dementia detection. For the acoustic part, the shifted windows multi-head attention is used for designing our acoustic-based system. For the linguistic part, the rhythm-related information, which varies significantly between people living with and without AD, is removed while transcribing the audio recordings into transcripts.
arXiv Detail & Related papers (2024-10-09T06:58:20Z)
Voice Disorder Analysis: a Transformer-based Approach [10.003909936239742]
This paper proposes a novel solution that adopts transformers directly working on raw voice signals. We consider many recording types at the same time, such as sentence reading and sustained vowel emission. The experimental results, obtained on both public and private datasets, show the effectiveness of our solution in the disorder detection and classification tasks.
arXiv Detail & Related papers (2024-06-20T19:29:04Z)
BTS: Bridging Text and Sound Modalities for Metadata-Aided Respiratory Sound Classification [0.0]
We fine-tune a pretrained text-audio multimodal model using free-text descriptions derived from the sound samples' metadata. Our method achieves state-of-the-art performance on the ICBHI dataset, surpassing the previous best result by a notable margin of 1.17%.
arXiv Detail & Related papers (2024-06-10T20:49:54Z)
Analysing the Impact of Audio Quality on the Use of Naturalistic Long-Form Recordings for Infant-Directed Speech Research [62.997667081978825]
Modelling of early language acquisition aims to understand how infants bootstrap their language skills. Recent developments have enabled the use of more naturalistic training data for computational models. It is currently unclear how the sound quality could affect analyses and modelling experiments conducted on such data.
arXiv Detail & Related papers (2023-05-03T08:25:37Z)
Exploring linguistic feature and model combination for speech recognition based automatic AD detection [61.91708957996086]
Speech based automatic AD screening systems provide a non-intrusive and more scalable alternative to other clinical screening techniques. Scarcity of specialist data leads to uncertainty in both model selection and feature learning when developing such systems. This paper investigates the use of feature and model combination approaches to improve the robustness of domain fine-tuning of BERT and Roberta pre-trained text encoders.
arXiv Detail & Related papers (2022-06-28T05:09:01Z)
Exploiting Cross-domain And Cross-Lingual Ultrasound Tongue Imaging Features For Elderly And Dysarthric Speech Recognition [55.25565305101314]
Articulatory features are invariant to acoustic signal distortion and have been successfully incorporated into automatic speech recognition systems. This paper presents a cross-domain and cross-lingual A2A inversion approach that utilizes the parallel audio and ultrasound tongue imaging (UTI) data of the 24-hour TaL corpus in A2A model pre-training. Experiments conducted on three tasks suggested incorporating the generated articulatory features consistently outperformed the baseline TDNN and Conformer ASR systems.
arXiv Detail & Related papers (2022-06-15T07:20:28Z)
Exploiting Cross Domain Acoustic-to-articulatory Inverted Features For Disordered Speech Recognition [57.15942628305797]
Articulatory features are invariant to acoustic signal distortion and have been successfully incorporated into automatic speech recognition systems for normal speech. This paper presents a cross-domain acoustic-to-articulatory (A2A) inversion approach that utilizes the parallel acoustic-articulatory data of the 15-hour TORGO corpus in model training. Cross-domain adapted to the 102.7-hour UASpeech corpus and to produce articulatory features.
arXiv Detail & Related papers (2022-03-19T08:47:18Z)
Comparative Study of Speech Analysis Methods to Predict Parkinson's Disease [0.0]
Speech disorders can be used to detect this disease before it degenerates. This work analyzes speech features and machine learning approaches to predict PD. Using all the acoustic features and MFCC, together with SVM produced the highest performance with an accuracy of 98%.
arXiv Detail & Related papers (2021-11-15T04:29:51Z)
Improving Medical Image Classification with Label Noise Using Dual-uncertainty Estimation [72.0276067144762]
We discuss and define the two common types of label noise in medical images. We propose an uncertainty estimation-based framework to handle these two label noise amid the medical image classification task.
arXiv Detail & Related papers (2021-02-28T14:56:45Z)
Unsupervised Domain Adaptation for Acoustic Scene Classification Using Band-Wise Statistics Matching [69.24460241328521]
Machine learning algorithms can be negatively affected by mismatches between training (source) and test (target) data distributions. We propose an unsupervised domain adaptation method that consists of aligning the first- and second-order sample statistics of each frequency band of target-domain acoustic scenes to the ones of the source-domain training dataset. We show that the proposed method outperforms the state-of-the-art unsupervised methods found in the literature in terms of both source- and target-domain classification accuracy.
arXiv Detail & Related papers (2020-04-30T23:56:05Z)

This list is automatically generated from the titles and abstracts of the papers in this site.