Related papers: Influence of ASR and Language Model on Alzheimer's Disease Detection

Influence of ASR and Language Model on Alzheimer's Disease Detection

URL: http://arxiv.org/abs/2110.15704v1
Date: Mon, 20 Sep 2021 10:41:39 GMT
Title: Influence of ASR and Language Model on Alzheimer's Disease Detection
Authors: Joan Codina-Filb\`a and Guillermo C\'ambara and Jordi Luque and Mireia Farr\'us
Abstract summary: We analyse the usage of a SotA ASR system to transcribe participant's spoken descriptions from a picture. We study the influence of a language model -- which tends to correct non-standard sequences of words -- with the lack of language model to decode the hypothesis from the ASR. The proposed system combines acoustic -- based on prosody and voice quality -- and lexical features based on the first occurrence of the most common words.
Score: 2.4698886064068555
License: http://creativecommons.org/licenses/by-nc-sa/4.0/
Abstract: Alzheimer's Disease is the most common form of dementia. Automatic detection from speech could help to identify symptoms at early stages, so that preventive actions can be carried out. This research is a contribution to the ADReSSo Challenge, we analyze the usage of a SotA ASR system to transcribe participant's spoken descriptions from a picture. We analyse the loss of performance regarding the use of human transcriptions (measured using transcriptions from the 2020 ADReSS Challenge). Furthermore, we study the influence of a language model -- which tends to correct non-standard sequences of words -- with the lack of language model to decode the hypothesis from the ASR. This aims at studying the language bias and get more meaningful transcriptions based only on the acoustic information from patients. The proposed system combines acoustic -- based on prosody and voice quality -- and lexical features based on the first occurrence of the most common words. The reported results show the effect of using automatic transcripts with or without language model. The best fully automatic system achieves up to 76.06 % of accuracy (without language model), significantly higher, 3 % above, than a system employing word transcriptions decoded using general purpose language models.

Related papers

Benchmarking Foundation Speech and Language Models for Alzheimer's Disease and Related Dementia Detection from Spontaneous Speech [14.936023751079654]
Alzheimer's disease and related dementias are progressive neurodegenerative conditions.<n>Spontaneous speech contains rich acoustic and linguistic markers that may serve as non-invasive biomarkers.<n>Foundation models, pre-trained on large-scale audio or text data, produce high-dimensional embeddings encoding contextual and acoustic features.
arXiv Detail & Related papers (2025-06-09T17:52:31Z)
DECT: Harnessing LLM-assisted Fine-Grained Linguistic Knowledge and Label-Switched and Label-Preserved Data Generation for Diagnosis of Alzheimer's Disease [13.38075448636078]
Alzheimer's Disease (AD) is an irreversible neurodegenerative disease affecting 50 million people worldwide. Language impairment is one of the earliest signs of cognitive decline, which can be used to discriminate AD patients from normal control individuals. Patient-interviewer dialogues may be used to detect such impairments, but they are often mixed with ambiguous, noisy, and irrelevant information.
arXiv Detail & Related papers (2025-02-06T04:00:25Z)
Devising a Set of Compact and Explainable Spoken Language Feature for Screening Alzheimer's Disease [52.46922921214341]
Alzheimer's disease (AD) has become one of the most significant health challenges in an aging society. We devised an explainable and effective feature set that leverages the visual capabilities of a large language model (LLM) and the Term Frequency-Inverse Document Frequency (TF-IDF) model. Our new features can be well explained and interpreted step by step which enhance the interpretability of automatic AD screening.
arXiv Detail & Related papers (2024-11-28T05:23:22Z)
Swin-BERT: A Feature Fusion System designed for Speech-based Alzheimer's Dementia Detection [4.668008953332776]
We propose a speech-based system named Swin-BERT for automatic dementia detection. For the acoustic part, the shifted windows multi-head attention is used for designing our acoustic-based system. For the linguistic part, the rhythm-related information, which varies significantly between people living with and without AD, is removed while transcribing the audio recordings into transcripts.
arXiv Detail & Related papers (2024-10-09T06:58:20Z)
Where are we in audio deepfake detection? A systematic analysis over generative and detection models [59.09338266364506]
SONAR is a synthetic AI-Audio Detection Framework and Benchmark. It provides a comprehensive evaluation for distinguishing cutting-edge AI-synthesized auditory content. It is the first framework to uniformly benchmark AI-audio detection across both traditional and foundation model-based detection systems.
arXiv Detail & Related papers (2024-10-06T01:03:42Z)
Profiling Patient Transcript Using Large Language Model Reasoning Augmentation for Alzheimer's Disease Detection [4.961581278723015]
Alzheimer's disease (AD) stands as the predominant cause of dementia, characterized by a gradual decline in speech and language capabilities. Recent deep-learning advancements have facilitated automated AD detection through spontaneous speech. Common transcript-based detection methods directly model text patterns in each utterance without a global view of the patient's linguistic characteristics.
arXiv Detail & Related papers (2024-09-19T07:58:07Z)
HyPoradise: An Open Baseline for Generative Speech Recognition with Large Language Models [81.56455625624041]
We introduce the first open-source benchmark to utilize external large language models (LLMs) for ASR error correction. The proposed benchmark contains a novel dataset, HyPoradise (HP), encompassing more than 334,000 pairs of N-best hypotheses. LLMs with reasonable prompt and its generative capability can even correct those tokens that are missing in N-best list.
arXiv Detail & Related papers (2023-09-27T14:44:10Z)
Leveraging Pretrained Representations with Task-related Keywords for Alzheimer's Disease Detection [69.53626024091076]
Alzheimer's disease (AD) is particularly prominent in older adults. Recent advances in pre-trained models motivate AD detection modeling to shift from low-level features to high-level representations. This paper presents several efficient methods to extract better AD-related cues from high-level acoustic and linguistic features.
arXiv Detail & Related papers (2023-03-14T16:03:28Z)
Cross-lingual Alzheimer's Disease detection based on paralinguistic and pre-trained features [6.928826160866143]
We present our submission to the ICASSP-SPGC-2023 ADReSS-M Challenge Task. This task aims to investigate which acoustic features can be generalized and transferred across languages for Alzheimer's Disease prediction. We extract paralinguistic features using openSmile toolkit and acoustic features using XLSR-53. Our method achieves an accuracy of 69.6% on the classification task and a root mean squared error (RMSE) of 4.788 on the regression task.
arXiv Detail & Related papers (2023-03-14T06:34:18Z)
Multilingual Alzheimer's Dementia Recognition through Spontaneous Speech: a Signal Processing Grand Challenge [18.684024762601215]
This Signal Processing Grand Challenge (SPGC) targets a difficult automatic prediction problem of societal and medical relevance. The Challenge has been designed to assess the extent to which predictive models built based on speech in one language (English) generalise to another language (Greek)
arXiv Detail & Related papers (2023-01-13T14:09:13Z)
Exploiting prompt learning with pre-trained language models for Alzheimer's Disease detection [70.86672569101536]
Early diagnosis of Alzheimer's disease (AD) is crucial in facilitating preventive care and to delay further progression. This paper investigates the use of prompt-based fine-tuning of PLMs that consistently uses AD classification errors as the training objective function.
arXiv Detail & Related papers (2022-10-29T09:18:41Z)
Exploring linguistic feature and model combination for speech recognition based automatic AD detection [61.91708957996086]
Speech based automatic AD screening systems provide a non-intrusive and more scalable alternative to other clinical screening techniques. Scarcity of specialist data leads to uncertainty in both model selection and feature learning when developing such systems. This paper investigates the use of feature and model combination approaches to improve the robustness of domain fine-tuning of BERT and Roberta pre-trained text encoders.
arXiv Detail & Related papers (2022-06-28T05:09:01Z)
Exploiting Cross-domain And Cross-Lingual Ultrasound Tongue Imaging Features For Elderly And Dysarthric Speech Recognition [55.25565305101314]
Articulatory features are invariant to acoustic signal distortion and have been successfully incorporated into automatic speech recognition systems. This paper presents a cross-domain and cross-lingual A2A inversion approach that utilizes the parallel audio and ultrasound tongue imaging (UTI) data of the 24-hour TaL corpus in A2A model pre-training. Experiments conducted on three tasks suggested incorporating the generated articulatory features consistently outperformed the baseline TDNN and Conformer ASR systems.
arXiv Detail & Related papers (2022-06-15T07:20:28Z)
Zero-Shot Cross-lingual Aphasia Detection using Automatic Speech Recognition [3.2631198264090746]
Aphasia is a common speech and language disorder, typically caused by a brain injury or a stroke, that affects millions of people worldwide. We propose an end-to-end pipeline using pre-trained Automatic Speech Recognition (ASR) models that share cross-lingual speech representations.
arXiv Detail & Related papers (2022-04-01T14:05:02Z)
NUVA: A Naming Utterance Verifier for Aphasia Treatment [49.114436579008476]
Assessment of speech performance using picture naming tasks is a key method for both diagnosis and monitoring of responses to treatment interventions by people with aphasia (PWA) Here we present NUVA, an utterance verification system incorporating a deep learning element that classifies 'correct' versus'incorrect' naming attempts from aphasic stroke patients. When tested on eight native British-English speaking PWA the system's performance accuracy ranged between 83.6% to 93.6%, with a 10-fold cross-validation mean of 89.5%.
arXiv Detail & Related papers (2021-02-10T13:00:29Z)

This list is automatically generated from the titles and abstracts of the papers in this site.