Exploring Multimodal Approaches for Alzheimer's Disease Detection Using
Patient Speech Transcript and Audio Data
- URL: http://arxiv.org/abs/2307.02514v1
- Date: Wed, 5 Jul 2023 12:40:11 GMT
- Title: Exploring Multimodal Approaches for Alzheimer's Disease Detection Using
Patient Speech Transcript and Audio Data
- Authors: Hongmin Cai, Xiaoke Huang, Zhengliang Liu, Wenxiong Liao, Haixing Dai,
Zihao Wu, Dajiang Zhu, Hui Ren, Quanzheng Li, Tianming Liu, and Xiang Li
- Abstract summary: Alzheimer's disease (AD) is a common form of dementia that severely impacts patient health.
This study investigates various methods for detecting AD using patients' speech and transcripts data from the DementiaBank Pitt database.
- Score: 10.782153332144533
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Alzheimer's disease (AD) is a common form of dementia that severely impacts
patient health. As AD impairs the patient's language understanding and
expression ability, the speech of AD patients can serve as an indicator of this
disease. This study investigates various methods for detecting AD using
patients' speech and transcripts data from the DementiaBank Pitt database. The
proposed approach involves pre-trained language models and Graph Neural Network
(GNN) that constructs a graph from the speech transcript, and extracts features
using GNN for AD detection. Data augmentation techniques, including synonym
replacement, GPT-based augmenter, and so on, were used to address the small
dataset size. Audio data was also introduced, and WavLM model was used to
extract audio features. These features were then fused with text features using
various methods. Finally, a contrastive learning approach was attempted by
converting speech transcripts back to audio and using it for contrastive
learning with the original audio. We conducted intensive experiments and
analysis on the above methods. Our findings shed light on the challenges and
potential solutions in AD detection using speech and audio data.
Related papers
- Swin-BERT: A Feature Fusion System designed for Speech-based Alzheimer's Dementia Detection [4.668008953332776]
We propose a speech-based system named Swin-BERT for automatic dementia detection.
For the acoustic part, the shifted windows multi-head attention is used for designing our acoustic-based system.
For the linguistic part, the rhythm-related information, which varies significantly between people living with and without AD, is removed while transcribing the audio recordings into transcripts.
arXiv Detail & Related papers (2024-10-09T06:58:20Z) - Analysing the Impact of Audio Quality on the Use of Naturalistic
Long-Form Recordings for Infant-Directed Speech Research [62.997667081978825]
Modelling of early language acquisition aims to understand how infants bootstrap their language skills.
Recent developments have enabled the use of more naturalistic training data for computational models.
It is currently unclear how the sound quality could affect analyses and modelling experiments conducted on such data.
arXiv Detail & Related papers (2023-05-03T08:25:37Z) - Leveraging Pretrained Representations with Task-related Keywords for
Alzheimer's Disease Detection [69.53626024091076]
Alzheimer's disease (AD) is particularly prominent in older adults.
Recent advances in pre-trained models motivate AD detection modeling to shift from low-level features to high-level representations.
This paper presents several efficient methods to extract better AD-related cues from high-level acoustic and linguistic features.
arXiv Detail & Related papers (2023-03-14T16:03:28Z) - Multilingual Alzheimer's Dementia Recognition through Spontaneous
Speech: a Signal Processing Grand Challenge [18.684024762601215]
This Signal Processing Grand Challenge (SPGC) targets a difficult automatic prediction problem of societal and medical relevance.
The Challenge has been designed to assess the extent to which predictive models built based on speech in one language (English) generalise to another language (Greek)
arXiv Detail & Related papers (2023-01-13T14:09:13Z) - Exploring linguistic feature and model combination for speech
recognition based automatic AD detection [61.91708957996086]
Speech based automatic AD screening systems provide a non-intrusive and more scalable alternative to other clinical screening techniques.
Scarcity of specialist data leads to uncertainty in both model selection and feature learning when developing such systems.
This paper investigates the use of feature and model combination approaches to improve the robustness of domain fine-tuning of BERT and Roberta pre-trained text encoders.
arXiv Detail & Related papers (2022-06-28T05:09:01Z) - Data Augmentation for Dementia Detection in Spoken Language [1.7324358447544175]
Recent deep-learning techniques can offer a faster diagnosis and have shown promising results.
They require large amounts of labelled data which is not easily available for the task of dementia detection.
One effective solution to sparse data problems is data augmentation, though the exact methods need to be selected carefully.
arXiv Detail & Related papers (2022-06-26T13:40:25Z) - Exploiting Cross-domain And Cross-Lingual Ultrasound Tongue Imaging
Features For Elderly And Dysarthric Speech Recognition [55.25565305101314]
Articulatory features are invariant to acoustic signal distortion and have been successfully incorporated into automatic speech recognition systems.
This paper presents a cross-domain and cross-lingual A2A inversion approach that utilizes the parallel audio and ultrasound tongue imaging (UTI) data of the 24-hour TaL corpus in A2A model pre-training.
Experiments conducted on three tasks suggested incorporating the generated articulatory features consistently outperformed the baseline TDNN and Conformer ASR systems.
arXiv Detail & Related papers (2022-06-15T07:20:28Z) - Deep Learning for Hate Speech Detection: A Comparative Study [54.42226495344908]
We present here a large-scale empirical comparison of deep and shallow hate-speech detection methods.
Our goal is to illuminate progress in the area, and identify strengths and weaknesses in the current state-of-the-art.
In doing so we aim to provide guidance as to the use of hate-speech detection in practice, quantify the state-of-the-art, and identify future research directions.
arXiv Detail & Related papers (2022-02-19T03:48:20Z) - Investigation of Data Augmentation Techniques for Disordered Speech
Recognition [69.50670302435174]
This paper investigates a set of data augmentation techniques for disordered speech recognition.
Both normal and disordered speech were exploited in the augmentation process.
The final speaker adapted system constructed using the UASpeech corpus and the best augmentation approach based on speed perturbation produced up to 2.92% absolute word error rate (WER)
arXiv Detail & Related papers (2022-01-14T17:09:22Z) - Multi-modal fusion with gating using audio, lexical and disfluency
features for Alzheimer's Dementia recognition from spontaneous speech [11.34426502082293]
This paper is a submission to the Alzheimer's Dementia Recognition through Spontaneous Speech (ADReSS) challenge.
It aims to develop methods that can assist in the automated prediction of severity of Alzheimer's Disease from speech data.
arXiv Detail & Related papers (2021-06-17T17:20:57Z) - Multi-Modal Detection of Alzheimer's Disease from Speech and Text [3.702631194466718]
We propose a deep learning method that utilizes speech and the corresponding transcript simultaneously to detect Alzheimer's disease (AD)
The proposed method achieves 85.3% 10-fold cross-validation accuracy when trained and evaluated on the Dementiabank Pitt corpus.
arXiv Detail & Related papers (2020-11-30T21:18:17Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.