HAFFormer: A Hierarchical Attention-Free Framework for Alzheimer's Disease Detection From Spontaneous Speech
- URL: http://arxiv.org/abs/2405.03952v1
- Date: Tue, 7 May 2024 02:19:16 GMT
- Title: HAFFormer: A Hierarchical Attention-Free Framework for Alzheimer's Disease Detection From Spontaneous Speech
- Authors: Zhongren Dong, Zixing Zhang, Weixiang Xu, Jing Han, Jianjun Ou, Björn W. Schuller,
- Abstract summary: We construct a novel framework, namely Hierarchical Attention-Free Transformer (HAFFormer), to better deal with long speech for Alzheimer's Disease detection.
Specifically, we employ an attention-free module of Multi-Scale Depthwise Convolution to replace the self-attention and thus avoid the expensive computation.
By conducting extensive experiments on the ADReSS-M dataset, the introduced HAFFormer can achieve competitive results (82.6% accuracy) with other recent work.
- Score: 42.688549469089985
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Automatically detecting Alzheimer's Disease (AD) from spontaneous speech plays an important role in its early diagnosis. Recent approaches highly rely on the Transformer architectures due to its efficiency in modelling long-range context dependencies. However, the quadratic increase in computational complexity associated with self-attention and the length of audio poses a challenge when deploying such models on edge devices. In this context, we construct a novel framework, namely Hierarchical Attention-Free Transformer (HAFFormer), to better deal with long speech for AD detection. Specifically, we employ an attention-free module of Multi-Scale Depthwise Convolution to replace the self-attention and thus avoid the expensive computation, and a GELU-based Gated Linear Unit to replace the feedforward layer, aiming to automatically filter out the redundant information. Moreover, we design a hierarchical structure to force it to learn a variety of information grains, from the frame level to the dialogue level. By conducting extensive experiments on the ADReSS-M dataset, the introduced HAFFormer can achieve competitive results (82.6% accuracy) with other recent work, but with significant computational complexity and model size reduction compared to the standard Transformer. This shows the efficiency of HAFFormer in dealing with long audio for AD detection.
Related papers
- Differential Transformer [99.5117269150629]
Transformer tends to overallocate attention to irrelevant context.
We introduce Diff Transformer, which amplifies attention to relevant context while canceling noise.
It offers notable advantages in practical applications, such as long-context modeling, key information retrieval, hallucination mitigation, in-context learning, and reduction of activation outliers.
arXiv Detail & Related papers (2024-10-07T17:57:38Z) - Long-Tailed Anomaly Detection with Learnable Class Names [64.79139468331807]
We introduce several datasets with different levels of class imbalance and metrics for performance evaluation.
We then propose a novel method, LTAD, to detect defects from multiple and long-tailed classes, without relying on dataset class names.
LTAD substantially outperforms the state-of-the-art methods for most forms of dataset imbalance.
arXiv Detail & Related papers (2024-03-29T15:26:44Z) - It's Never Too Late: Fusing Acoustic Information into Large Language
Models for Automatic Speech Recognition [70.77292069313154]
Large language models (LLMs) can be successfully used for generative error correction (GER) on top of the automatic speech recognition (ASR) output.
In this work, we aim to overcome such a limitation by infusing acoustic information before generating the predicted transcription through a novel late fusion solution termed Uncertainty-Aware Dynamic Fusion (UADF)
arXiv Detail & Related papers (2024-02-08T07:21:45Z) - Echotune: A Modular Extractor Leveraging the Variable-Length Nature of Speech in ASR Tasks [4.132793413136553]
We introduce Echo-MSA, a nimble module equipped with a variable-length attention mechanism.
The proposed design captures the variable length feature of speech and addresses the limitations of fixed-length attention.
arXiv Detail & Related papers (2023-09-14T14:51:51Z) - CASHformer: Cognition Aware SHape Transformer for Longitudinal Analysis [3.7814216736076434]
CASHformer is a transformer-based framework to model longitudinal shape trajectories in Alzheimer's disease.
It reduces the number of parameters by over 90% with respect to the original model.
Our results show that CASHformer reduces the reconstruction error by 73% compared to previously proposed methods.
arXiv Detail & Related papers (2022-07-05T14:50:21Z) - Exploring linguistic feature and model combination for speech
recognition based automatic AD detection [61.91708957996086]
Speech based automatic AD screening systems provide a non-intrusive and more scalable alternative to other clinical screening techniques.
Scarcity of specialist data leads to uncertainty in both model selection and feature learning when developing such systems.
This paper investigates the use of feature and model combination approaches to improve the robustness of domain fine-tuning of BERT and Roberta pre-trained text encoders.
arXiv Detail & Related papers (2022-06-28T05:09:01Z) - Data Augmentation for Dementia Detection in Spoken Language [1.7324358447544175]
Recent deep-learning techniques can offer a faster diagnosis and have shown promising results.
They require large amounts of labelled data which is not easily available for the task of dementia detection.
One effective solution to sparse data problems is data augmentation, though the exact methods need to be selected carefully.
arXiv Detail & Related papers (2022-06-26T13:40:25Z) - Conformer Based Elderly Speech Recognition System for Alzheimer's
Disease Detection [62.23830810096617]
Early diagnosis of Alzheimer's disease (AD) is crucial in facilitating preventive care to delay further progression.
This paper presents the development of a state-of-the-art Conformer based speech recognition system built on the DementiaBank Pitt corpus for automatic AD detection.
arXiv Detail & Related papers (2022-06-23T12:50:55Z) - Delaying Interaction Layers in Transformer-based Encoders for Efficient
Open Domain Question Answering [3.111078740559015]
Open Domain Question Answering (ODQA) on a large-scale corpus of documents is a key challenge in computer science.
We propose a more direct and complementary solution which consists in applying a generic change in the architecture of transformer-based models.
The resulting variants are competitive with the original models on the extractive task and allow, on the ODQA setting, a significant speedup and even a performance improvement in many cases.
arXiv Detail & Related papers (2020-10-16T14:36:38Z) - To BERT or Not To BERT: Comparing Speech and Language-based Approaches
for Alzheimer's Disease Detection [17.99855227184379]
Natural language processing and machine learning provide promising techniques for reliably detecting Alzheimer's disease (AD)
We compare and contrast the performance of two such approaches for AD detection on the recent ADReSS challenge dataset.
We observe that fine-tuned BERT models, given the relative importance of linguistics in cognitive impairment detection, outperform feature-based approaches on the AD detection task.
arXiv Detail & Related papers (2020-07-26T04:50:47Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.