Context-aware attention layers coupled with optimal transport domain
adaptation and multimodal fusion methods for recognizing dementia from
spontaneous speech
- URL: http://arxiv.org/abs/2305.16406v2
- Date: Wed, 26 Jul 2023 20:53:22 GMT
- Title: Context-aware attention layers coupled with optimal transport domain
adaptation and multimodal fusion methods for recognizing dementia from
spontaneous speech
- Authors: Loukas Ilias, Dimitris Askounis
- Abstract summary: Alzheimer's disease (AD) constitutes a complex neurocognitive disease and is the main cause of dementia.
We propose some new methods for detecting AD patients, which capture the intra- and cross-modal interactions.
Experiments conducted on the ADReSS and ADReSSo Challenge indicate the efficacy of our introduced approaches over existing research initiatives.
- Score: 0.0
- License: http://creativecommons.org/licenses/by-nc-nd/4.0/
- Abstract: Alzheimer's disease (AD) constitutes a complex neurocognitive disease and is
the main cause of dementia. Although many studies have been proposed targeting
at diagnosing dementia through spontaneous speech, there are still limitations.
Existing state-of-the-art approaches, which propose multimodal methods, train
separately language and acoustic models, employ majority-vote approaches, and
concatenate the representations of the different modalities either at the input
level, i.e., early fusion, or during training. Also, some of them employ
self-attention layers, which calculate the dependencies between representations
without considering the contextual information. In addition, no prior work has
taken into consideration the model calibration. To address these limitations,
we propose some new methods for detecting AD patients, which capture the intra-
and cross-modal interactions. First, we convert the audio files into log-Mel
spectrograms, their delta, and delta-delta and create in this way an image per
audio file consisting of three channels. Next, we pass each transcript and
image through BERT and DeiT models respectively. After that, context-based
self-attention layers, self-attention layers with a gate model, and optimal
transport domain adaptation methods are employed for capturing the intra- and
inter-modal interactions. Finally, we exploit two methods for fusing the self
and cross-attention features. For taking into account the model calibration, we
apply label smoothing. We use both performance and calibration metrics.
Experiments conducted on the ADReSS and ADReSSo Challenge datasets indicate the
efficacy of our introduced approaches over existing research initiatives with
our best performing model reaching Accuracy and F1-score up to 91.25% and
91.06% respectively.
Related papers
- Towards Within-Class Variation in Alzheimer's Disease Detection from Spontaneous Speech [60.08015780474457]
Alzheimer's Disease (AD) detection has emerged as a promising research area that employs machine learning classification models.
We identify within-class variation as a critical challenge in AD detection: individuals with AD exhibit a spectrum of cognitive impairments.
We propose two novel methods: Soft Target Distillation (SoTD) and Instance-level Re-balancing (InRe), targeting two problems respectively.
arXiv Detail & Related papers (2024-09-22T02:06:05Z) - Diagnosing Alzheimer's Disease using Early-Late Multimodal Data Fusion
with Jacobian Maps [1.5501208213584152]
Alzheimer's disease (AD) is a prevalent and debilitating neurodegenerative disorder impacting a large aging population.
We propose an efficient early-late fusion (ELF) approach, which leverages a convolutional neural network for automated feature extraction and random forests.
To tackle the challenge of detecting subtle changes in brain volume, we transform images into the Jacobian domain (JD)
arXiv Detail & Related papers (2023-10-25T19:02:57Z) - Domain Adaptive Synapse Detection with Weak Point Annotations [63.97144211520869]
We present AdaSyn, a framework for domain adaptive synapse detection with weak point annotations.
In the WASPSYN challenge at I SBI 2023, our method ranks the 1st place.
arXiv Detail & Related papers (2023-08-31T05:05:53Z) - Improving Multiple Sclerosis Lesion Segmentation Across Clinical Sites:
A Federated Learning Approach with Noise-Resilient Training [75.40980802817349]
Deep learning models have shown promise for automatically segmenting MS lesions, but the scarcity of accurately annotated data hinders progress in this area.
We introduce a Decoupled Hard Label Correction (DHLC) strategy that considers the imbalanced distribution and fuzzy boundaries of MS lesions.
We also introduce a Centrally Enhanced Label Correction (CELC) strategy, which leverages the aggregated central model as a correction teacher for all sites.
arXiv Detail & Related papers (2023-08-31T00:36:10Z) - Cross-Attention is Not Enough: Incongruity-Aware Dynamic Hierarchical
Fusion for Multimodal Affect Recognition [69.32305810128994]
Incongruity between modalities poses a challenge for multimodal fusion, especially in affect recognition.
We propose the Hierarchical Crossmodal Transformer with Dynamic Modality Gating (HCT-DMG), a lightweight incongruity-aware model.
HCT-DMG: 1) outperforms previous multimodal models with a reduced size of approximately 0.8M parameters; 2) recognizes hard samples where incongruity makes affect recognition difficult; 3) mitigates the incongruity at the latent level in crossmodal attention.
arXiv Detail & Related papers (2023-05-23T01:24:15Z) - Leveraging Pretrained Representations with Task-related Keywords for
Alzheimer's Disease Detection [69.53626024091076]
Alzheimer's disease (AD) is particularly prominent in older adults.
Recent advances in pre-trained models motivate AD detection modeling to shift from low-level features to high-level representations.
This paper presents several efficient methods to extract better AD-related cues from high-level acoustic and linguistic features.
arXiv Detail & Related papers (2023-03-14T16:03:28Z) - Neural Architecture Search with Multimodal Fusion Methods for Diagnosing
Dementia [14.783829037950984]
Leveraging spontaneous speech in conjunction with machine learning methods for recognizing Alzheimer's dementia patients has emerged into a hot topic.
Finding a CNN architecture is a time-consuming process and requires expertise.
We exploit several fusion methods, including Multimodal Factorized Bilinear Pooling and Tucker Decomposition, to combine both speech and text modalities.
arXiv Detail & Related papers (2023-02-12T11:25:29Z) - A Multimodal Approach for Dementia Detection from Spontaneous Speech
with Tensor Fusion Layer [0.0]
Alzheimer's disease (AD) is a progressive neurological disorder, which affects memory, thinking skills, and mental abilities.
We propose deep neural networks, which can be trained in an end-to-end trainable way and capture the inter- and intra-modal interactions.
arXiv Detail & Related papers (2022-11-08T16:43:58Z) - Multiple Sclerosis Lesions Segmentation using Attention-Based CNNs in
FLAIR Images [0.2578242050187029]
Multiple Sclerosis (MS) is an autoimmune, and demyelinating disease that leads to lesions in the central nervous system.
Up to now a multitude of multimodality automatic biomedical approaches is used to segment lesions.
Authors propose a method employing just one modality (FLAIR image) to segment MS lesions accurately.
arXiv Detail & Related papers (2022-01-05T21:37:43Z) - Detecting Dementia from Speech and Transcripts using Transformers [0.0]
Alzheimer's disease (AD) constitutes a neurodegenerative disease with serious consequences to peoples' everyday lives, if it is not diagnosed early since there is no available cure.
Current work has been focused on diagnosing dementia from spontaneous speech.
arXiv Detail & Related papers (2021-10-27T21:00:01Z) - Rectified Meta-Learning from Noisy Labels for Robust Image-based Plant
Disease Diagnosis [64.82680813427054]
Plant diseases serve as one of main threats to food security and crop production.
One popular approach is to transform this problem as a leaf image classification task, which can be addressed by the powerful convolutional neural networks (CNNs)
We propose a novel framework that incorporates rectified meta-learning module into common CNN paradigm to train a noise-robust deep network without using extra supervision information.
arXiv Detail & Related papers (2020-03-17T09:51:30Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.