Alzheimer Disease Classification through ASR-based Transcriptions:
Exploring the Impact of Punctuation and Pauses
- URL: http://arxiv.org/abs/2306.03443v1
- Date: Tue, 6 Jun 2023 06:49:41 GMT
- Title: Alzheimer Disease Classification through ASR-based Transcriptions:
Exploring the Impact of Punctuation and Pauses
- Authors: Luc\'ia G\'omez-Zaragoz\'a, Simone Wills, Cristian Tejedor-Garcia,
Javier Mar\'in-Morales, Mariano Alca\~niz, Helmer Strik
- Abstract summary: Alzheimer's Disease (AD) is the world's leading neurodegenerative disease.
Recent ADReSS challenge provided a dataset for AD classification.
We used the new state-of-the-art Automatic Speech Recognition (ASR) model Whisper to obtain the transcriptions.
- Score: 6.053166856632848
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Alzheimer's Disease (AD) is the world's leading neurodegenerative disease,
which often results in communication difficulties. Analysing speech can serve
as a diagnostic tool for identifying the condition. The recent ADReSS challenge
provided a dataset for AD classification and highlighted the utility of manual
transcriptions. In this study, we used the new state-of-the-art Automatic
Speech Recognition (ASR) model Whisper to obtain the transcriptions, which also
include automatic punctuation. The classification models achieved test accuracy
scores of 0.854 and 0.833 combining the pretrained FastText word embeddings and
recurrent neural networks on manual and ASR transcripts respectively.
Additionally, we explored the influence of including pause information and
punctuation in the transcriptions. We found that punctuation only yielded minor
improvements in some cases, whereas pause encoding aided AD classification for
both manual and ASR transcriptions across all approaches investigated.
Related papers
- Extracting Biomedical Entities from Noisy Audio Transcripts [5.180763052209895]
This paper introduces a novel dataset, BioASR-NER, designed to bridge the ASR-NLP gap in the biomedical domain.
We present an innovative transcript-cleaning method using GPT4, investigating both zero-shot and few-shot methodologies.
Our study further delves into an error analysis, shedding light the types of errors in transcription software, corrections by GPT4, and the challenges GPT4 faces.
arXiv Detail & Related papers (2024-03-26T03:58:52Z) - Useful Blunders: Can Automated Speech Recognition Errors Improve
Downstream Dementia Classification? [9.275790963007173]
We investigated how errors from automatic speech recognition (ASR) systems affect dementia classification accuracy.
We aimed to assess whether imperfect ASR-generated transcripts could provide valuable information.
arXiv Detail & Related papers (2024-01-10T21:38:03Z) - LibriSpeech-PC: Benchmark for Evaluation of Punctuation and
Capitalization Capabilities of end-to-end ASR Models [58.790604613878216]
We introduce a LibriSpeech-PC benchmark designed to assess the punctuation and capitalization prediction capabilities of end-to-end ASR models.
The benchmark includes a LibriSpeech-PC dataset with restored punctuation and capitalization, a novel evaluation metric called Punctuation Error Rate (PER) that focuses on punctuation marks, and initial baseline models.
arXiv Detail & Related papers (2023-10-04T16:23:37Z) - HyPoradise: An Open Baseline for Generative Speech Recognition with
Large Language Models [81.56455625624041]
We introduce the first open-source benchmark to utilize external large language models (LLMs) for ASR error correction.
The proposed benchmark contains a novel dataset, HyPoradise (HP), encompassing more than 334,000 pairs of N-best hypotheses.
LLMs with reasonable prompt and its generative capability can even correct those tokens that are missing in N-best list.
arXiv Detail & Related papers (2023-09-27T14:44:10Z) - Boosting Punctuation Restoration with Data Generation and Reinforcement
Learning [70.26450819702728]
Punctuation restoration is an important task in automatic speech recognition (ASR)
The discrepancy between written punctuated texts and ASR texts limits the usability of written texts in training punctuation restoration systems for ASR texts.
This paper proposes a reinforcement learning method to exploit in-topic written texts and recent advances in large pre-trained generative language models to bridge this gap.
arXiv Detail & Related papers (2023-07-24T17:22:04Z) - Leveraging Pretrained Representations with Task-related Keywords for
Alzheimer's Disease Detection [69.53626024091076]
Alzheimer's disease (AD) is particularly prominent in older adults.
Recent advances in pre-trained models motivate AD detection modeling to shift from low-level features to high-level representations.
This paper presents several efficient methods to extract better AD-related cues from high-level acoustic and linguistic features.
arXiv Detail & Related papers (2023-03-14T16:03:28Z) - Exploring linguistic feature and model combination for speech
recognition based automatic AD detection [61.91708957996086]
Speech based automatic AD screening systems provide a non-intrusive and more scalable alternative to other clinical screening techniques.
Scarcity of specialist data leads to uncertainty in both model selection and feature learning when developing such systems.
This paper investigates the use of feature and model combination approaches to improve the robustness of domain fine-tuning of BERT and Roberta pre-trained text encoders.
arXiv Detail & Related papers (2022-06-28T05:09:01Z) - Influence of ASR and Language Model on Alzheimer's Disease Detection [2.4698886064068555]
We analyse the usage of a SotA ASR system to transcribe participant's spoken descriptions from a picture.
We study the influence of a language model -- which tends to correct non-standard sequences of words -- with the lack of language model to decode the hypothesis from the ASR.
The proposed system combines acoustic -- based on prosody and voice quality -- and lexical features based on the first occurrence of the most common words.
arXiv Detail & Related papers (2021-09-20T10:41:39Z) - NUVA: A Naming Utterance Verifier for Aphasia Treatment [49.114436579008476]
Assessment of speech performance using picture naming tasks is a key method for both diagnosis and monitoring of responses to treatment interventions by people with aphasia (PWA)
Here we present NUVA, an utterance verification system incorporating a deep learning element that classifies 'correct' versus'incorrect' naming attempts from aphasic stroke patients.
When tested on eight native British-English speaking PWA the system's performance accuracy ranged between 83.6% to 93.6%, with a 10-fold cross-validation mean of 89.5%.
arXiv Detail & Related papers (2021-02-10T13:00:29Z) - Multi-Modal Detection of Alzheimer's Disease from Speech and Text [3.702631194466718]
We propose a deep learning method that utilizes speech and the corresponding transcript simultaneously to detect Alzheimer's disease (AD)
The proposed method achieves 85.3% 10-fold cross-validation accuracy when trained and evaluated on the Dementiabank Pitt corpus.
arXiv Detail & Related papers (2020-11-30T21:18:17Z) - Robust Prediction of Punctuation and Truecasing for Medical ASR [18.08508027663331]
This paper proposes a conditional joint modeling framework for prediction of punctuation and truecasing.
We also present techniques for domain and task specific adaptation by fine-tuning masked language models with medical domain data.
arXiv Detail & Related papers (2020-07-04T07:15:13Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.