NUVA: A Naming Utterance Verifier for Aphasia Treatment
- URL: http://arxiv.org/abs/2102.05408v1
- Date: Wed, 10 Feb 2021 13:00:29 GMT
- Title: NUVA: A Naming Utterance Verifier for Aphasia Treatment
- Authors: David Sabate Barbera, Mark Huckvale, Victoria Fleming, Emily Upton,
Henry Coley-Fisher, Catherine Doogan, Ian Shaw, William Latham, Alexander P.
Leff, Jenny Crinion
- Abstract summary: Assessment of speech performance using picture naming tasks is a key method for both diagnosis and monitoring of responses to treatment interventions by people with aphasia (PWA)
Here we present NUVA, an utterance verification system incorporating a deep learning element that classifies 'correct' versus'incorrect' naming attempts from aphasic stroke patients.
When tested on eight native British-English speaking PWA the system's performance accuracy ranged between 83.6% to 93.6%, with a 10-fold cross-validation mean of 89.5%.
- Score: 49.114436579008476
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Anomia (word-finding difficulties) is the hallmark of aphasia, an acquired
language disorder most commonly caused by stroke. Assessment of speech
performance using picture naming tasks is a key method for both diagnosis and
monitoring of responses to treatment interventions by people with aphasia
(PWA). Currently, this assessment is conducted manually by speech and language
therapists (SLT). Surprisingly, despite advancements in automatic speech
recognition (ASR) and artificial intelligence with technologies like deep
learning, research on developing automated systems for this task has been
scarce. Here we present NUVA, an utterance verification system incorporating a
deep learning element that classifies 'correct' versus' incorrect' naming
attempts from aphasic stroke patients. When tested on eight native
British-English speaking PWA the system's performance accuracy ranged between
83.6% to 93.6%, with a 10-fold cross-validation mean of 89.5%. This performance
was not only significantly better than a baseline created for this study using
one of the leading commercially available ASRs (Google speech-to-text service)
but also comparable in some instances with two independent SLT ratings for the
same dataset.
Related papers
- Automatically measuring speech fluency in people with aphasia: first
achievements using read-speech data [55.84746218227712]
This study aims at assessing the relevance of a signalprocessingalgorithm, initially developed in the field of language acquisition, for the automatic measurement of speech fluency.
arXiv Detail & Related papers (2023-08-09T07:51:40Z) - A New Benchmark of Aphasia Speech Recognition and Detection Based on
E-Branchformer and Multi-task Learning [29.916793641951507]
This paper presents a new benchmark for Aphasia speech recognition using state-of-the-art speech recognition techniques.
We introduce two multi-task learning methods based on the CTC/Attention architecture to perform both tasks simultaneously.
Our system achieves state-of-the-art speaker-level detection accuracy (97.3%), and a relative WER reduction of 11% for moderate Aphasia patients.
arXiv Detail & Related papers (2023-05-19T15:10:36Z) - Cross-lingual Alzheimer's Disease detection based on paralinguistic and
pre-trained features [6.928826160866143]
We present our submission to the ICASSP-SPGC-2023 ADReSS-M Challenge Task.
This task aims to investigate which acoustic features can be generalized and transferred across languages for Alzheimer's Disease prediction.
We extract paralinguistic features using openSmile toolkit and acoustic features using XLSR-53.
Our method achieves an accuracy of 69.6% on the classification task and a root mean squared error (RMSE) of 4.788 on the regression task.
arXiv Detail & Related papers (2023-03-14T06:34:18Z) - Exploiting prompt learning with pre-trained language models for
Alzheimer's Disease detection [70.86672569101536]
Early diagnosis of Alzheimer's disease (AD) is crucial in facilitating preventive care and to delay further progression.
This paper investigates the use of prompt-based fine-tuning of PLMs that consistently uses AD classification errors as the training objective function.
arXiv Detail & Related papers (2022-10-29T09:18:41Z) - Exploiting Cross-domain And Cross-Lingual Ultrasound Tongue Imaging
Features For Elderly And Dysarthric Speech Recognition [55.25565305101314]
Articulatory features are invariant to acoustic signal distortion and have been successfully incorporated into automatic speech recognition systems.
This paper presents a cross-domain and cross-lingual A2A inversion approach that utilizes the parallel audio and ultrasound tongue imaging (UTI) data of the 24-hour TaL corpus in A2A model pre-training.
Experiments conducted on three tasks suggested incorporating the generated articulatory features consistently outperformed the baseline TDNN and Conformer ASR systems.
arXiv Detail & Related papers (2022-06-15T07:20:28Z) - Zero-Shot Cross-lingual Aphasia Detection using Automatic Speech
Recognition [3.2631198264090746]
Aphasia is a common speech and language disorder, typically caused by a brain injury or a stroke, that affects millions of people worldwide.
We propose an end-to-end pipeline using pre-trained Automatic Speech Recognition (ASR) models that share cross-lingual speech representations.
arXiv Detail & Related papers (2022-04-01T14:05:02Z) - Sequence-level self-learning with multiple hypotheses [53.04725240411895]
We develop new self-learning techniques with an attention-based sequence-to-sequence (seq2seq) model for automatic speech recognition (ASR)
In contrast to conventional unsupervised learning approaches, we adopt the emphmulti-task learning (MTL) framework.
Our experiment results show that our method can reduce the WER on the British speech data from 14.55% to 10.36% compared to the baseline model trained with the US English data only.
arXiv Detail & Related papers (2021-12-10T20:47:58Z) - Influence of ASR and Language Model on Alzheimer's Disease Detection [2.4698886064068555]
We analyse the usage of a SotA ASR system to transcribe participant's spoken descriptions from a picture.
We study the influence of a language model -- which tends to correct non-standard sequences of words -- with the lack of language model to decode the hypothesis from the ASR.
The proposed system combines acoustic -- based on prosody and voice quality -- and lexical features based on the first occurrence of the most common words.
arXiv Detail & Related papers (2021-09-20T10:41:39Z) - Explainable Identification of Dementia from Transcripts using
Transformer Networks [0.0]
Alzheimer's disease (AD) is the main cause of dementia which is accompanied by loss of memory and may lead to severe consequences in peoples' everyday life if not diagnosed on time.
We introduce two multi-task learning models, where the main task refers to the identification of dementia (binary classification) and the auxiliary one corresponds to the identification of the severity of dementia (multiclass classification)
Our model obtains accuracy equal to 84.99% on the detection of AD patients in the multi-task learning setting.
arXiv Detail & Related papers (2021-09-14T21:49:05Z) - Dynamic Acoustic Unit Augmentation With BPE-Dropout for Low-Resource
End-to-End Speech Recognition [62.94773371761236]
We consider building an effective end-to-end ASR system in low-resource setups with a high OOV rate.
We propose a method of dynamic acoustic unit augmentation based on the BPE-dropout technique.
Our monolingual Turkish Conformer established a competitive result with 22.2% character error rate (CER) and 38.9% word error rate (WER)
arXiv Detail & Related papers (2021-03-12T10:10:13Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.