Seq2seq for Automatic Paraphasia Detection in Aphasic Speech
- URL: http://arxiv.org/abs/2312.10518v1
- Date: Sat, 16 Dec 2023 18:22:37 GMT
- Title: Seq2seq for Automatic Paraphasia Detection in Aphasic Speech
- Authors: Matthew Perez, Duc Le, Amrit Romana, Elise Jones, Keli Licata, Emily
Mower Provost
- Abstract summary: Paraphasias are speech errors that are characteristic of aphasia and represent an important signal in assessing disease severity and subtype.
Traditionally, clinicians manually identify paraphasias by transcribing and analyzing speech-language samples.
We propose a novel, sequence-to-sequence (seq2seq) model that is trained end-to-end (E2E) to perform both ASR and paraphasia detection tasks.
- Score: 14.686874756530322
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Paraphasias are speech errors that are often characteristic of aphasia and
they represent an important signal in assessing disease severity and subtype.
Traditionally, clinicians manually identify paraphasias by transcribing and
analyzing speech-language samples, which can be a time-consuming and burdensome
process. Identifying paraphasias automatically can greatly help clinicians with
the transcription process and ultimately facilitate more efficient and
consistent aphasia assessment. Previous research has demonstrated the
feasibility of automatic paraphasia detection by training an automatic speech
recognition (ASR) model to extract transcripts and then training a separate
paraphasia detection model on a set of hand-engineered features. In this paper,
we propose a novel, sequence-to-sequence (seq2seq) model that is trained
end-to-end (E2E) to perform both ASR and paraphasia detection tasks. We show
that the proposed model outperforms the previous state-of-the-art approach for
both word-level and utterance-level paraphasia detection tasks and provide
additional follow-up evaluations to further understand the proposed model
behavior.
Related papers
- Self-supervised Speech Models for Word-Level Stuttered Speech Detection [66.46810024006712]
We introduce a word-level stuttering speech detection model leveraging self-supervised speech models.
Our evaluation demonstrates that our model surpasses previous approaches in word-level stuttering speech detection.
arXiv Detail & Related papers (2024-09-16T20:18:20Z) - Beyond Binary: Multiclass Paraphasia Detection with Generative Pretrained Transformers and End-to-End Models [10.131053400122308]
Aphasia is a language disorder that can lead to speech errors known as paraphasias.
We present novel approaches that use a generative pretrained transformer (GPT) to identify paraphasias from transcripts.
We demonstrate that a single sequence model outperforms GPT baselines for multiclass paraphasia detection.
arXiv Detail & Related papers (2024-07-16T03:24:51Z) - Careful Whisper -- leveraging advances in automatic speech recognition
for robust and interpretable aphasia subtype classification [0.0]
This paper presents a fully automated approach for identifying speech anomalies from voice recordings to aid in the assessment of speech impairments.
By combining Connectionist Temporal Classification (CTC) and encoder-decoder-based automatic speech recognition models, we generate rich acoustic and clean transcripts.
We then apply several natural language processing methods to extract features from these transcripts to produce prototypes of healthy speech.
arXiv Detail & Related papers (2023-08-02T15:53:59Z) - A New Benchmark of Aphasia Speech Recognition and Detection Based on
E-Branchformer and Multi-task Learning [29.916793641951507]
This paper presents a new benchmark for Aphasia speech recognition using state-of-the-art speech recognition techniques.
We introduce two multi-task learning methods based on the CTC/Attention architecture to perform both tasks simultaneously.
Our system achieves state-of-the-art speaker-level detection accuracy (97.3%), and a relative WER reduction of 11% for moderate Aphasia patients.
arXiv Detail & Related papers (2023-05-19T15:10:36Z) - Leveraging Pretrained Representations with Task-related Keywords for
Alzheimer's Disease Detection [69.53626024091076]
Alzheimer's disease (AD) is particularly prominent in older adults.
Recent advances in pre-trained models motivate AD detection modeling to shift from low-level features to high-level representations.
This paper presents several efficient methods to extract better AD-related cues from high-level acoustic and linguistic features.
arXiv Detail & Related papers (2023-03-14T16:03:28Z) - Self-Normalized Importance Sampling for Neural Language Modeling [97.96857871187052]
In this work, we propose self-normalized importance sampling. Compared to our previous work, the criteria considered in this work are self-normalized and there is no need to further conduct a correction step.
We show that our proposed self-normalized importance sampling is competitive in both research-oriented and production-oriented automatic speech recognition tasks.
arXiv Detail & Related papers (2021-11-11T16:57:53Z) - A Preliminary Study of a Two-Stage Paradigm for Preserving Speaker
Identity in Dysarthric Voice Conversion [50.040466658605524]
We propose a new paradigm for maintaining speaker identity in dysarthric voice conversion (DVC)
The poor quality of dysarthric speech can be greatly improved by statistical VC.
But as the normal speech utterances of a dysarthria patient are nearly impossible to collect, previous work failed to recover the individuality of the patient.
arXiv Detail & Related papers (2021-06-02T18:41:03Z) - Meta Auxiliary Learning for Facial Action Unit Detection [84.22521265124806]
We consider learning AU detection and facial expression recognition in a multi-task manner.
The performance of the AU detection task cannot be always enhanced due to the negative transfer in the multi-task scenario.
We propose a Meta Learning method (MAL) that automatically selects highly related FE samples by learning adaptative weights for the training FE samples in a meta learning manner.
arXiv Detail & Related papers (2021-05-14T02:28:40Z) - Multi-Modal Detection of Alzheimer's Disease from Speech and Text [3.702631194466718]
We propose a deep learning method that utilizes speech and the corresponding transcript simultaneously to detect Alzheimer's disease (AD)
The proposed method achieves 85.3% 10-fold cross-validation accuracy when trained and evaluated on the Dementiabank Pitt corpus.
arXiv Detail & Related papers (2020-11-30T21:18:17Z) - Generating diverse and natural text-to-speech samples using a quantized
fine-grained VAE and auto-regressive prosody prior [53.69310441063162]
This paper proposes a sequential prior in a discrete latent space which can generate more naturally sounding samples.
We evaluate the approach using listening tests, objective metrics of automatic speech recognition (ASR) performance, and measurements of prosody attributes.
arXiv Detail & Related papers (2020-02-06T12:35:50Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.