A Multimodal Approach for Dementia Detection from Spontaneous Speech
with Tensor Fusion Layer
- URL: http://arxiv.org/abs/2211.04368v1
- Date: Tue, 8 Nov 2022 16:43:58 GMT
- Title: A Multimodal Approach for Dementia Detection from Spontaneous Speech
with Tensor Fusion Layer
- Authors: Loukas Ilias, Dimitris Askounis, John Psarras
- Abstract summary: Alzheimer's disease (AD) is a progressive neurological disorder, which affects memory, thinking skills, and mental abilities.
We propose deep neural networks, which can be trained in an end-to-end trainable way and capture the inter- and intra-modal interactions.
- Score: 0.0
- License: http://creativecommons.org/licenses/by-nc-nd/4.0/
- Abstract: Alzheimer's disease (AD) is a progressive neurological disorder, meaning that
the symptoms develop gradually throughout the years. It is also the main cause
of dementia, which affects memory, thinking skills, and mental abilities.
Nowadays, researchers have moved their interest towards AD detection from
spontaneous speech, since it constitutes a time-effective procedure. However,
existing state-of-the-art works proposing multimodal approaches do not take
into consideration the inter- and intra-modal interactions and propose early
and late fusion approaches. To tackle these limitations, we propose deep neural
networks, which can be trained in an end-to-end trainable way and capture the
inter- and intra-modal interactions. Firstly, each audio file is converted to
an image consisting of three channels, i.e., log-Mel spectrogram, delta, and
delta-delta. Next, each transcript is passed through a BERT model followed by a
gated self-attention layer. Similarly, each image is passed through a Swin
Transformer followed by an independent gated self-attention layer. Acoustic
features are extracted also from each audio file. Finally, the representation
vectors from the different modalities are fed to a tensor fusion layer for
capturing the inter-modal interactions. Extensive experiments conducted on the
ADReSS Challenge dataset indicate that our introduced approaches obtain
valuable advantages over existing research initiatives reaching Accuracy and
F1-score up to 86.25% and 85.48% respectively.
Related papers
- Diagnosing Alzheimer's Disease using Early-Late Multimodal Data Fusion
with Jacobian Maps [1.5501208213584152]
Alzheimer's disease (AD) is a prevalent and debilitating neurodegenerative disorder impacting a large aging population.
We propose an efficient early-late fusion (ELF) approach, which leverages a convolutional neural network for automated feature extraction and random forests.
To tackle the challenge of detecting subtle changes in brain volume, we transform images into the Jacobian domain (JD)
arXiv Detail & Related papers (2023-10-25T19:02:57Z) - I$^2$MD: 3D Action Representation Learning with Inter- and Intra-modal
Mutual Distillation [147.2183428328396]
We introduce a general Inter- and Intra-modal Mutual Distillation (I$2$MD) framework.
In I$2$MD, we first re-formulate the cross-modal interaction as a Cross-modal Mutual Distillation (CMD) process.
To alleviate the interference of similar samples and exploit their underlying contexts, we further design the Intra-modal Mutual Distillation (IMD) strategy.
arXiv Detail & Related papers (2023-10-24T07:22:17Z) - Context-aware attention layers coupled with optimal transport domain
adaptation and multimodal fusion methods for recognizing dementia from
spontaneous speech [0.0]
Alzheimer's disease (AD) constitutes a complex neurocognitive disease and is the main cause of dementia.
We propose some new methods for detecting AD patients, which capture the intra- and cross-modal interactions.
Experiments conducted on the ADReSS and ADReSSo Challenge indicate the efficacy of our introduced approaches over existing research initiatives.
arXiv Detail & Related papers (2023-05-25T18:18:09Z) - Leveraging Pretrained Representations with Task-related Keywords for
Alzheimer's Disease Detection [69.53626024091076]
Alzheimer's disease (AD) is particularly prominent in older adults.
Recent advances in pre-trained models motivate AD detection modeling to shift from low-level features to high-level representations.
This paper presents several efficient methods to extract better AD-related cues from high-level acoustic and linguistic features.
arXiv Detail & Related papers (2023-03-14T16:03:28Z) - Neural Architecture Search with Multimodal Fusion Methods for Diagnosing
Dementia [14.783829037950984]
Leveraging spontaneous speech in conjunction with machine learning methods for recognizing Alzheimer's dementia patients has emerged into a hot topic.
Finding a CNN architecture is a time-consuming process and requires expertise.
We exploit several fusion methods, including Multimodal Factorized Bilinear Pooling and Tucker Decomposition, to combine both speech and text modalities.
arXiv Detail & Related papers (2023-02-12T11:25:29Z) - End-to-End Active Speaker Detection [58.7097258722291]
We propose an end-to-end training network where feature learning and contextual predictions are jointly learned.
We also introduce intertemporal graph neural network (iGNN) blocks, which split the message passing according to the main sources of context in the ASD problem.
Experiments show that the aggregated features from the iGNN blocks are more suitable for ASD, resulting in state-of-the art performance.
arXiv Detail & Related papers (2022-03-27T08:55:28Z) - MMLatch: Bottom-up Top-down Fusion for Multimodal Sentiment Analysis [84.7287684402508]
Current deep learning approaches for multimodal fusion rely on bottom-up fusion of high and mid-level latent modality representations.
Models of human perception highlight the importance of top-down fusion, where high-level representations affect the way sensory inputs are perceived.
We propose a neural architecture that captures top-down cross-modal interactions, using a feedback mechanism in the forward pass during network training.
arXiv Detail & Related papers (2022-01-24T17:48:04Z) - Detecting Dementia from Speech and Transcripts using Transformers [0.0]
Alzheimer's disease (AD) constitutes a neurodegenerative disease with serious consequences to peoples' everyday lives, if it is not diagnosed early since there is no available cure.
Current work has been focused on diagnosing dementia from spontaneous speech.
arXiv Detail & Related papers (2021-10-27T21:00:01Z) - Multi-Modal Detection of Alzheimer's Disease from Speech and Text [3.702631194466718]
We propose a deep learning method that utilizes speech and the corresponding transcript simultaneously to detect Alzheimer's disease (AD)
The proposed method achieves 85.3% 10-fold cross-validation accuracy when trained and evaluated on the Dementiabank Pitt corpus.
arXiv Detail & Related papers (2020-11-30T21:18:17Z) - M2Net: Multi-modal Multi-channel Network for Overall Survival Time
Prediction of Brain Tumor Patients [151.4352001822956]
Early and accurate prediction of overall survival (OS) time can help to obtain better treatment planning for brain tumor patients.
Existing prediction methods rely on radiomic features at the local lesion area of a magnetic resonance (MR) volume.
We propose an end-to-end OS time prediction model; namely, Multi-modal Multi-channel Network (M2Net)
arXiv Detail & Related papers (2020-06-01T05:21:37Z) - Detecting Parkinsonian Tremor from IMU Data Collected In-The-Wild using
Deep Multiple-Instance Learning [59.74684475991192]
Parkinson's Disease (PD) is a slowly evolving neuro-logical disease that affects about 1% of the population above 60 years old.
PD symptoms include tremor, rigidity and braykinesia.
We present a method for automatically identifying tremorous episodes related to PD, based on IMU signals captured via a smartphone device.
arXiv Detail & Related papers (2020-05-06T09:02:30Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.