A New Benchmark of Aphasia Speech Recognition and Detection Based on
E-Branchformer and Multi-task Learning
- URL: http://arxiv.org/abs/2305.13331v1
- Date: Fri, 19 May 2023 15:10:36 GMT
- Title: A New Benchmark of Aphasia Speech Recognition and Detection Based on
E-Branchformer and Multi-task Learning
- Authors: Jiyang Tang, William Chen, Xuankai Chang, Shinji Watanabe, Brian
MacWhinney
- Abstract summary: This paper presents a new benchmark for Aphasia speech recognition using state-of-the-art speech recognition techniques.
We introduce two multi-task learning methods based on the CTC/Attention architecture to perform both tasks simultaneously.
Our system achieves state-of-the-art speaker-level detection accuracy (97.3%), and a relative WER reduction of 11% for moderate Aphasia patients.
- Score: 29.916793641951507
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Aphasia is a language disorder that affects the speaking ability of millions
of patients. This paper presents a new benchmark for Aphasia speech recognition
and detection tasks using state-of-the-art speech recognition techniques with
the AphsiaBank dataset. Specifically, we introduce two multi-task learning
methods based on the CTC/Attention architecture to perform both tasks
simultaneously. Our system achieves state-of-the-art speaker-level detection
accuracy (97.3%), and a relative WER reduction of 11% for moderate Aphasia
patients. In addition, we demonstrate the generalizability of our approach by
applying it to another disordered speech database, the DementiaBank Pitt
corpus. We will make our all-in-one recipes and pre-trained model publicly
available to facilitate reproducibility. Our standardized data preprocessing
pipeline and open-source recipes enable researchers to compare results
directly, promoting progress in disordered speech processing.
Related papers
- Selfsupervised learning for pathological speech detection [0.0]
Speech production is susceptible to influence and disruption by various neurodegenerative pathological speech disorders.
These disorders lead to pathological speech characterized by abnormal speech patterns and imprecise articulation.
Unlike neurotypical speakers, patients with speech pathologies or impairments are unable to access various virtual assistants such as Alexa, Siri, etc.
arXiv Detail & Related papers (2024-05-16T07:12:47Z) - Seq2seq for Automatic Paraphasia Detection in Aphasic Speech [14.686874756530322]
Paraphasias are speech errors that are characteristic of aphasia and represent an important signal in assessing disease severity and subtype.
Traditionally, clinicians manually identify paraphasias by transcribing and analyzing speech-language samples.
We propose a novel, sequence-to-sequence (seq2seq) model that is trained end-to-end (E2E) to perform both ASR and paraphasia detection tasks.
arXiv Detail & Related papers (2023-12-16T18:22:37Z) - Automatically measuring speech fluency in people with aphasia: first
achievements using read-speech data [55.84746218227712]
This study aims at assessing the relevance of a signalprocessingalgorithm, initially developed in the field of language acquisition, for the automatic measurement of speech fluency.
arXiv Detail & Related papers (2023-08-09T07:51:40Z) - Leveraging Pretrained Representations with Task-related Keywords for
Alzheimer's Disease Detection [69.53626024091076]
Alzheimer's disease (AD) is particularly prominent in older adults.
Recent advances in pre-trained models motivate AD detection modeling to shift from low-level features to high-level representations.
This paper presents several efficient methods to extract better AD-related cues from high-level acoustic and linguistic features.
arXiv Detail & Related papers (2023-03-14T16:03:28Z) - Investigation of Data Augmentation Techniques for Disordered Speech
Recognition [69.50670302435174]
This paper investigates a set of data augmentation techniques for disordered speech recognition.
Both normal and disordered speech were exploited in the augmentation process.
The final speaker adapted system constructed using the UASpeech corpus and the best augmentation approach based on speed perturbation produced up to 2.92% absolute word error rate (WER)
arXiv Detail & Related papers (2022-01-14T17:09:22Z) - UniSpeech-SAT: Universal Speech Representation Learning with Speaker
Aware Pre-Training [72.004873454347]
Two methods are introduced for enhancing the unsupervised speaker information extraction.
Experiment results on SUPERB benchmark show that the proposed system achieves state-of-the-art performance.
We scale up training dataset to 94 thousand hours public audio data and achieve further performance improvement.
arXiv Detail & Related papers (2021-10-12T05:43:30Z) - Brain Signals to Rescue Aphasia, Apraxia and Dysarthria Speech
Recognition [14.544989316741091]
We propose a deep learning-based algorithm to improve the performance of automatic speech recognition systems for aphasia, apraxia, and dysarthria speech.
We demonstrate a significant decoding performance improvement by more than 50% during test time for isolated speech recognition task.
Results show the first step towards demonstrating the possibility of utilizing non-invasive neural signals to design a real-time robust speech prosthetic for stroke survivors recovering from aphasia, apraxia, and dysarthria.
arXiv Detail & Related papers (2021-02-28T03:27:02Z) - NUVA: A Naming Utterance Verifier for Aphasia Treatment [49.114436579008476]
Assessment of speech performance using picture naming tasks is a key method for both diagnosis and monitoring of responses to treatment interventions by people with aphasia (PWA)
Here we present NUVA, an utterance verification system incorporating a deep learning element that classifies 'correct' versus'incorrect' naming attempts from aphasic stroke patients.
When tested on eight native British-English speaking PWA the system's performance accuracy ranged between 83.6% to 93.6%, with a 10-fold cross-validation mean of 89.5%.
arXiv Detail & Related papers (2021-02-10T13:00:29Z) - Comparison of Speaker Role Recognition and Speaker Enrollment Protocol
for conversational Clinical Interviews [9.728371067160941]
We train end-to-end neural network architectures to adapt to each task and evaluate each approach under the same metric.
Results do not depend on the demographics of the Interviewee, highlighting the clinical relevance of our methods.
arXiv Detail & Related papers (2020-10-30T09:07:37Z) - Speech Enhancement using Self-Adaptation and Multi-Head Self-Attention [70.82604384963679]
This paper investigates a self-adaptation method for speech enhancement using auxiliary speaker-aware features.
We extract a speaker representation used for adaptation directly from the test utterance.
arXiv Detail & Related papers (2020-02-14T05:05:36Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.