Related papers: Swin-BERT: A Feature Fusion System designed for Speech-based Alzheimer's Dementia Detection

Swin-BERT: A Feature Fusion System designed for Speech-based Alzheimer's Dementia Detection

URL: http://arxiv.org/abs/2410.07277v1
Date: Wed, 9 Oct 2024 06:58:20 GMT
Title: Swin-BERT: A Feature Fusion System designed for Speech-based Alzheimer's Dementia Detection
Authors: Yilin Pan, Yanpei Shi, Yijia Zhang, Mingyu Lu,
Abstract summary: We propose a speech-based system named Swin-BERT for automatic dementia detection. For the acoustic part, the shifted windows multi-head attention is used for designing our acoustic-based system. For the linguistic part, the rhythm-related information, which varies significantly between people living with and without AD, is removed while transcribing the audio recordings into transcripts.
Score: 4.668008953332776
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Speech is usually used for constructing an automatic Alzheimer's dementia (AD) detection system, as the acoustic and linguistic abilities show a decline in people living with AD at the early stages. However, speech includes not only AD-related local and global information but also other information unrelated to cognitive status, such as age and gender. In this paper, we propose a speech-based system named Swin-BERT for automatic dementia detection. For the acoustic part, the shifted windows multi-head attention that proposed to extract local and global information from images, is used for designing our acoustic-based system. To decouple the effect of age and gender on acoustic feature extraction, they are used as an extra input of the designed acoustic system. For the linguistic part, the rhythm-related information, which varies significantly between people living with and without AD, is removed while transcribing the audio recordings into transcripts. To compensate for the removed rhythm-related information, the character-level transcripts are proposed to be used as the extra input of a word-level BERT-style system. Finally, the Swin-BERT combines the acoustic features learned from our proposed acoustic-based system with our linguistic-based system. The experiments are based on the two datasets provided by the international dementia detection challenges: the ADReSS and ADReSSo. The results show that both the proposed acoustic and linguistic systems can be better or comparable with previous research on the two datasets. Superior results are achieved by the proposed Swin-BERT system on the ADReSS and ADReSSo datasets, which are 85.58\% F-score and 87.32\% F-score respectively.

Related papers

DECT: Harnessing LLM-assisted Fine-Grained Linguistic Knowledge and Label-Switched and Label-Preserved Data Generation for Diagnosis of Alzheimer's Disease [13.38075448636078]
Alzheimer's Disease (AD) is an irreversible neurodegenerative disease affecting 50 million people worldwide. Language impairment is one of the earliest signs of cognitive decline, which can be used to discriminate AD patients from normal control individuals. Patient-interviewer dialogues may be used to detect such impairments, but they are often mixed with ambiguous, noisy, and irrelevant information.
arXiv Detail & Related papers (2025-02-06T04:00:25Z)
Devising a Set of Compact and Explainable Spoken Language Feature for Screening Alzheimer's Disease [52.46922921214341]
Alzheimer's disease (AD) has become one of the most significant health challenges in an aging society. We devised an explainable and effective feature set that leverages the visual capabilities of a large language model (LLM) and the Term Frequency-Inverse Document Frequency (TF-IDF) model. Our new features can be well explained and interpreted step by step which enhance the interpretability of automatic AD screening.
arXiv Detail & Related papers (2024-11-28T05:23:22Z)
The Unreliability of Acoustic Systems in Alzheimer's Speech Datasets with Heterogeneous Recording Conditions [11.00082412847855]
We show that systems based on two acoustic features, MFCCs and Wav2vec 2.0 embeddings, can discriminate AD patients from controls with above-chance performance. Our results are a warning against the use of acoustic systems for identifying patients based on non-standardized recordings.
arXiv Detail & Related papers (2024-09-11T20:50:45Z)
Exploring Multimodal Approaches for Alzheimer's Disease Detection Using Patient Speech Transcript and Audio Data [10.782153332144533]
Alzheimer's disease (AD) is a common form of dementia that severely impacts patient health. This study investigates various methods for detecting AD using patients' speech and transcripts data from the DementiaBank Pitt database.
arXiv Detail & Related papers (2023-07-05T12:40:11Z)
Leveraging Pretrained Representations with Task-related Keywords for Alzheimer's Disease Detection [69.53626024091076]
Alzheimer's disease (AD) is particularly prominent in older adults. Recent advances in pre-trained models motivate AD detection modeling to shift from low-level features to high-level representations. This paper presents several efficient methods to extract better AD-related cues from high-level acoustic and linguistic features.
arXiv Detail & Related papers (2023-03-14T16:03:28Z)
Cross-lingual Alzheimer's Disease detection based on paralinguistic and pre-trained features [6.928826160866143]
We present our submission to the ICASSP-SPGC-2023 ADReSS-M Challenge Task. This task aims to investigate which acoustic features can be generalized and transferred across languages for Alzheimer's Disease prediction. We extract paralinguistic features using openSmile toolkit and acoustic features using XLSR-53. Our method achieves an accuracy of 69.6% on the classification task and a root mean squared error (RMSE) of 4.788 on the regression task.
arXiv Detail & Related papers (2023-03-14T06:34:18Z)
Exploiting prompt learning with pre-trained language models for Alzheimer's Disease detection [70.86672569101536]
Early diagnosis of Alzheimer's disease (AD) is crucial in facilitating preventive care and to delay further progression. This paper investigates the use of prompt-based fine-tuning of PLMs that consistently uses AD classification errors as the training objective function.
arXiv Detail & Related papers (2022-10-29T09:18:41Z)
Exploring linguistic feature and model combination for speech recognition based automatic AD detection [61.91708957996086]
Speech based automatic AD screening systems provide a non-intrusive and more scalable alternative to other clinical screening techniques. Scarcity of specialist data leads to uncertainty in both model selection and feature learning when developing such systems. This paper investigates the use of feature and model combination approaches to improve the robustness of domain fine-tuning of BERT and Roberta pre-trained text encoders.
arXiv Detail & Related papers (2022-06-28T05:09:01Z)
Conformer Based Elderly Speech Recognition System for Alzheimer's Disease Detection [62.23830810096617]
Early diagnosis of Alzheimer's disease (AD) is crucial in facilitating preventive care to delay further progression. This paper presents the development of a state-of-the-art Conformer based speech recognition system built on the DementiaBank Pitt corpus for automatic AD detection.
arXiv Detail & Related papers (2022-06-23T12:50:55Z)
Exploiting Cross-domain And Cross-Lingual Ultrasound Tongue Imaging Features For Elderly And Dysarthric Speech Recognition [55.25565305101314]
Articulatory features are invariant to acoustic signal distortion and have been successfully incorporated into automatic speech recognition systems. This paper presents a cross-domain and cross-lingual A2A inversion approach that utilizes the parallel audio and ultrasound tongue imaging (UTI) data of the 24-hour TaL corpus in A2A model pre-training. Experiments conducted on three tasks suggested incorporating the generated articulatory features consistently outperformed the baseline TDNN and Conformer ASR systems.
arXiv Detail & Related papers (2022-06-15T07:20:28Z)
Influence of ASR and Language Model on Alzheimer's Disease Detection [2.4698886064068555]
We analyse the usage of a SotA ASR system to transcribe participant's spoken descriptions from a picture. We study the influence of a language model -- which tends to correct non-standard sequences of words -- with the lack of language model to decode the hypothesis from the ASR. The proposed system combines acoustic -- based on prosody and voice quality -- and lexical features based on the first occurrence of the most common words.
arXiv Detail & Related papers (2021-09-20T10:41:39Z)
Dynamic Acoustic Unit Augmentation With BPE-Dropout for Low-Resource End-to-End Speech Recognition [62.94773371761236]
We consider building an effective end-to-end ASR system in low-resource setups with a high OOV rate. We propose a method of dynamic acoustic unit augmentation based on the BPE-dropout technique. Our monolingual Turkish Conformer established a competitive result with 22.2% character error rate (CER) and 38.9% word error rate (WER)
arXiv Detail & Related papers (2021-03-12T10:10:13Z)
Multi-Modal Detection of Alzheimer's Disease from Speech and Text [3.702631194466718]
We propose a deep learning method that utilizes speech and the corresponding transcript simultaneously to detect Alzheimer's disease (AD) The proposed method achieves 85.3% 10-fold cross-validation accuracy when trained and evaluated on the Dementiabank Pitt corpus.
arXiv Detail & Related papers (2020-11-30T21:18:17Z)

This list is automatically generated from the titles and abstracts of the papers in this site.